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COMPLEMENTARY DNAs 

Related Applications 
[0001] The present application is a divisional application of United States 

5 Patent Application Serial No. 09/247,155 filed on February 9, 1999, which claims priority 
from United States Provisional Patent Application Serial No. 60/074,121 filed February 9, 
1998, United States Provisional Patent Application Serial No. 60/081,563, filed April 13, 
1998, United States Provisional Patent Application Serial No. 60/096,1 16, filed August 10, 
1998, and United States Provisional Patent Application Serial No. 60/099,273 filed 
10 September 4, 1998, the disclosures of which are incorporated herein by reference in their 
entirety. 

[0002] Table I lists the SEQ ID Nos. of the extended cDNAs in the present 

application, the SEQ ID Nos. of the extended cDNAs in the provisional applications, and 
the identities of the provisional applications in which the extended cDNAs were disclosed. 

15 

Background of the Invention 

[0003] The estimated 50,000-100,000 genes scattered along the human 

chromosomes offer tremendous promise for the understanding, diagnosis, and treatment of 
human diseases, hi addition, probes capable of specifically hybridizing to loci distributed 
20 throughout the human genome find applications in the construction of high resolution 
chromosome maps and in the identification of individuals. 

[0004] In the past, the characterization of even a single human gene was a 

painstaking process, requiring years of effort. Recent developments in the areas of cloning 
vectors, DNA sequencing, and computer technology have merged to greatly accelerate the 

25 rate at which human genes can be isolated, sequenced, mapped, and characterized. 

Cloning vectors such as yeast artificial chromosomes (YACs) and bacterial artificial 
chromosomes (BACs) are able to accept DNA inserts ranging from 300 to 1000 kilobases 
(kb) or 100-400 kb in length respectively, thereby facilitating the manipulation and 
ordering of DNA sequences distributed over great distances on the human chromosomes. 

30 Automated DNA sequencing machines permit the rapid sequencing of human genes. 



-1- 



36.US3.DIV 

Bioinfortnatics software enables the comparison of nucleic acid and protein sequences, 
thereby assisting in the characterization of human gene products, 

[0005] Currently, two different approaches are being pursued for 

identifying and characterizing the genes distributed along the human genome, hi one 
5 approach, large fragments of genomic DNA are isolated, cloned, and sequenced. Potential 
open reading frames in these genomic sequences are identified using bio-informatics 
software. However, this approach entails sequencing large stretches of human DNA 
which do not encode proteins in order to find the protein encoding sequences scattered 
throughout the genome. In addition to requiring extensive sequencing, the bio-informatics 
10 software may mischaracterize the genomic sequences obtained. Thus, the software may 
produce false positives in which non-coding DNA is mischaracterized as coding DNA or 
false negatives in which coding DNA is mislabeled as non-coding DNA. 

[0006] An alternative approach takes a more direct route to identifying and 

characterizing human genes. In this approach, complementary DNAs (cDNAs) are 

15 synthesized from isolated messenger RNAs (mRNAs) which encode human proteins. 

Using this approach, sequencing is only performed on DNA which is derived from protein 
coding portions of the genome. Often, only short stretches of the cDNAs are sequenced to 
obtain sequences called expressed sequence tags (ESTs). The ESTs may then be used to 
isolate or purify extended cDNAs which include sequences adjacent to the EST sequences. 

20 The extended cDNAs may contain all of the sequence of the EST which was used to 
obtain them or only a portion of the sequence of the EST which was used to obtain them. 
In addition, the extended cDNAs may contain the full coding sequence of the gene from 
which the EST was derived or, alternatively, the extended cDNAs may include portions of 
the coding sequence of the gene from which the EST was derived. It will be appreciated 

25 that there may be several extended cDNAs which include the EST sequence as a result of 

alternate splicing or the activity of alternative promoters. 

[0007] In the past, the short EST sequences which could be used to isolate 

or purify extended cDNAs were often obtained from oligo-dT primed cDNA libraries. 
Accordingly, they mainly corresponded to the 3' untranslated region of the mRNA. In 
30 part, the prevalence of EST sequences derived from the 3' end of the mRNA is a result of 
the fact that typical techniques for obtaining cDNAs, are not well suited for isolating 
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cDNA sequences derived from the 5' ends of mRNAs. (Adams et al. ? Nature 311:114, 
1996, Hillieretal, Genome Res. 6:807-828, 1996). 

[0008] In addition, in those reported instances where longer cDNA 

sequences have been obtained, the reported sequences typically correspond to coding 
5 sequences and do not include the full 5' untranslated region of the mRNA from which the 

cDNA is derived. Such incomplete sequences may not include the first exon of the 
mRNA, particularly in situations where the first exon is short. Furthermore, they may not 
include some exons, often short ones, which are located upstream of splicing sites. Thus, 
there is a need to obtain sequences derived from the 5' ends of mRNAs which can be used 
10 to obtain extended cDNAs which may include the 5' sequences contained in the 5' ESTs. 

[0009] While many sequences derived from human chromosomes have 

practical applications, approaches based on the identification and characterization of those 
chromosomal sequences which encode a protein product are particularly relevant to 
diagnostic and therapeutic uses. Of the 50,000-100,000 protein coding genes, those genes 
15 encoding proteins which are secreted from the cell in which they are synthesized, as well 

as the secreted proteins themselves, are particularly valuable as potential therapeutic 
agents. Such proteins are often involved in cell to cell communication and may be 
responsible for producing a clinically relevant response in their target cells. 

[0010] In fact, several secretory proteins, including tissue plasminogen 

20 activator, G-CSF, GM-CSF, erythropoietin, human growth hormone, insulin, interferon-a, 

interferon-p, interferon-y, and interleukin-2, are currently in clinical use. These proteins 
are used to treat a wide range of conditions, including acute myocardial infarction, acute 
ischemic stroke, anemia, diabetes, growth hormone deficiency, hepatitis, kidney 
carcinoma, chemotherapy induced neutropenia and multiple sclerosis. For these reasons, 
25 extended cDNAs encoding secreted proteins or portions thereof represent a particularly 

valuable source of therapeutic agents. Thus, there is a need for the identification and 
characterization of secreted proteins and the nucleic acids encoding them. 

[0011] In addition to being therapeutically useful themselves, secretory 

proteins include short peptides, called signal peptides, at their amino termini which direct 
30 their secretion. These signal peptides are encoded by the signal sequences located at the 5' 
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ends of the coding sequences of genes encoding secreted proteins. Because these signal 
peptides will direct the extracellular secretion of any protein to which they are operably 
linked, the signal sequences may be exploited to direct the efficient secretion of any 
protein by operably linking the signal sequences to a gene encoding the protein for which 

5 secretion is desired. This may prove beneficial in gene therapy strategies in which it is 

desired to deliver a particular gene product to cells other than the cell in which it is 
produced. Signal sequences encoding signal peptides also find application in simplifying 
protein purification techniques. In such applications, the extracellular secretion of the 
desired protein greatly facilitates purification by reducing the number of undesired proteins 

10 from which the desired protein must be selected. Thus, there exists a need to identify and 

characterize the 5' portions of the genes for secretory proteins which encode signal 
peptides. 

[0012] Public information on the number of human genes for which the 

promoters and upstream regulatory regions have been identified and characterized is quite 

15 limited. In part, this may be due to the difficulty of isolating such regulatory sequences. 

Upstream regulatory sequences such as transcription factor binding sites are typically too 
short to be utilized as probes for isolating promoters from human genomic libraries. 
Recently, some approaches have been developed to isolate human promoters. One of them 
consists of making a CpG island library (Cross, S.H. et al., Purification of CpG Islands 

20 using a Methylated DNA Binding Column, Nature Genetics 6: 236-244 (1994)). The 
second consists of isolating human genomic DNA sequences containing Spel binding sites 
by the use of Spel binding protein. (Mortlock et al., Genome Res, 6:327-335, 1996). Both 
of these approaches have their limits due to a lack of specificity or of comprehensiveness. 

[0013] 5' ESTs and extended cDNAs obtainable therefrom may be used to 

25 efficiently identify and isolate upstream regulatory regions which control the location, 
developmental stage, rate, and quantity of protein synthesis, as well as the stability of the 
mRNA. (Theil et al, BioFactors 4:87-93, (1993). Once identified and characterized, these 
regulatory regions may be utilized in gene therapy or protein purification schemes to 
obtain the desired amount and locations of protein synthesis or to inhibit, reduce, or 
30 prevent the synthesis of undesirable gene products. 
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[0014] In addition, ESTs containing the 5' ends of secretory protein genes 

or extended cDNAs which include sequences adjacent to the sequences of the ESTs may 
include sequences useful as probes for chromosome mapping and the identification of 
individuals. Thus, there is a need to identify and characterize the sequences upstream of 
the 5' coding sequences of genes encoding secretory proteins. 

Summary of the Invention 

[0015] The present invention relates to purified, isolated, or recombinant 

extended cDNAs which encode secreted proteins or fragments thereof. Preferably, the 
purified, isolated or recombinant cDNAs contain the entire open reading frame of their 
corresponding mRNAs, including a start codon and a stop codon. For example, the 
extended cDNAs may include nucleic acids encoding the signal peptide as well as the 
mature protein. Alternatively, the extended cDNAs may contain a fragment of the open 
reading frame. In some embodiments, the fragment may encode only the sequence of the 
mature protein. Alternatively, the fragment may encode only a portion of the mature 
protein. A further aspect of the present invention is a nucleic acid which encodes the 
signal peptide of a secreted protein. 

[0016] The present extended cDNAs were obtained using ESTs which 

include sequences derived from the authentic 5' ends of their corresponding mRNAs. As 
used herein the terms "EST" or "5' EST" refer to the short cDNAs which were used to 
obtain the extended cDNAs of the present invention. As used herein, the term "extended 
cDNA" refers to the cDNAs which include sequences adjacent to the 5' EST used to obtain 
them. The extended cDNAs may contain all or a portion of the sequence of the EST 
which was used to obtain them. The term "corresponding mRNA" refers to the mRNA 
which was the template for the cDNA synthesis which produced the 5' EST. As used 
herein, the term "purified" does not require absolute purity; rather, it is intended as a 
relative definition. Individual extended cDNA clones isolated from a cDNA library have 
been conventionally purified to electrophoretic homogeneity. The sequences obtained 
from these clones could not be obtained directly either from the library or from total 
human DNA. The extended cDNA clones are not naturally occurring as such, but rather 
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are obtained via manipulation of a partially purified naturally occurring substance 
(messenger RNA). The conversion of mRNA into a cDNA library involves the creation of 
a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the 
synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA 
5 and subsequently isolating individual clones from that library results in an approximately 
10 4 -10 6 fold purification of the native message. Purification of starting material or natural 
material to at least one order of magnitude, preferably two or three orders, and more 
preferably four or five orders of magnitude is expressly contemplated. 

[0017] As used herein, the term "isolated" requires that the material be 

10 removed from its original environment (e.g., the natural environment if it is naturally 
occurring). For example, a naturally-occurring polynucleotide present in a living animal is 
not isolated, but the same polynucleotide, separated from some or all of the coexisting 
materials in the natural system, is isolated. 

[0018] As used herein, the term "recombinant" means that the extended 

15 cDNA is adjacent to "backbone" nucleic acid to which it is not adjacent in its natural 
environment. Additionally, to be "enriched" the extended cDNAs will represent 5% or 
more of the number of nucleic acid inserts in a population of nucleic acid backbone 
molecules. Backbone molecules according to the present invention include nucleic acids 
such as expression vectors, self-replicating nucleic acids, viruses, integrating nucleic acids, 
20 and other vectors or nucleic acids used to maintain or manipulate a nucleic acid insert of 
interest. Preferably, the enriched extended cDNAs represent 15% or more of the number 
of nucleic acid inserts in the population of recombinant backbone molecules. More 
preferably, the enriched extended cDNAs represent 50% or more of the number of nucleic 
acid inserts in the population of recombinant backbone molecules. In a highly preferred 
25 embodiment, the enriched extended cDNAs represent 90% or more of the number of 
nucleic acid inserts in the population of recombinant backbone molecules. "Stringent", 
"moderate," and "low" hybridization conditions are as defined in Example 29. 

[0019] Unless otherwise indicated, a "complementary" sequence is fully 

complementary. Thus, extended cDNAs encoding secreted polypeptides or fragments 
30 thereof which are present in cDNA libraries in which one or more extended cDNAs 
encoding secreted polypeptides or fragments thereof make up 5% or more of the number 
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of nucleic acid inserts in the backbone molecules are "enriched recombinant extended 
cDNAs" as defined herein. Likewise, extended cDNAs encoding secreted polypeptides or 
fragments thereof which are in a population of plasmids in which one or more extended 
cDNAs of the present invention have been inserted such that they represent 5% or more of 
5 the number of inserts in the plasmid backbone are " enriched recombinant extended 
cDNAs" as defined herein. However, extended cDNAs encoding secreted polypeptides or 
fragments thereof which are in cDNA libraries in which the extended cDNAs encoding 
secreted polypeptides or fragments thereof constitute less than 5% of the number of 
nucleic acid inserts in the population of backbone molecules, such as libraries in which 
10 backbone molecules having a cDNA insert encoding a secreted polypeptide are extremely 
rare, are not "enriched recombinant extended cDNAs." 

[0020] In particular, the present invention relates to extended cDNAs 

which were derived from genes encoding secreted proteins. As used herein, a "secreted" 
protein is one which, when expressed in a suitable host cell, is transported across or 
15 through a membrane, including transport as a result of signal peptides in its amino acid 

sequence. "Secreted" proteins include without limitation proteins secreted wholly (e.g. 
soluble proteins), or partially (e.g. receptors) from the cell in which they are expressed. 
"Secreted" proteins also include without limitation proteins which are transported across 
the membrane of the endoplasmic reticulum. 

20 [0021] Extended cDNAs encoding secreted proteins may include nucleic 

acid sequences, called signal sequences, which encode signal peptides which direct the 
extracellular secretion of the proteins encoded by the extended cDNAs. Generally, the 
signal peptides are located at the amino termini of secreted proteins. 

[0022] Secreted proteins are translated by ribosomes associated with the 

25 "rough" endoplasmic reticulum. Generally, secreted proteins are co-translationally 

transferred to the membrane of the endoplasmic reticulum. Association of the ribosome 
with the endoplasmic reticulum during translation of secreted proteins is mediated by the 
signal peptide. The signal peptide is typically cleaved following its co-translational entry 
into the endoplasmic reticulum. After delivery to the endoplasmic reticulum, secreted 
30 proteins may proceed through the Golgi apparatus. In the Golgi apparatus, the proteins 
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may undergo post-translational modification before entering secretory vesicles which 
transport them across the cell membrane. 

[0023] The extended cDNAs of the present invention have several 

important applications. For example, they may be used to express the entire secreted 
5 protein which they encode. Alternatively, they may be used to express portions of the 
secreted protein. The portions may comprise the signal peptides encoded by the extended 
cDNAs or the mature proteins encoded by the extended cDNAs (i.e. the proteins generated 
when the signal peptide is cleaved off). The portions may also comprise polypeptides 
having at least 10 consecutive amino acids encoded by the extended cDNAs. 
10 Alternatively, the portions may comprise at least 15 consecutive amino acids encoded by 
the extended cDNAs. In some embodiments, the portions may comprise at least 25 
consecutive amino acids encoded by the extended cDNAs. In other embodiments, the 
portions may comprise at least 40 amino acids encoded by the extended cDNAs. 

[0024] Antibodies which specifically recognize the entire secreted proteins 

15 encoded by the extended cDNAs or fragments thereof having at least 10 consecutive 
amino acids, at least 15 consecutive amino acids, at least 25 consecutive amino acids, or at 
least 40 consecutive amino acids may also be obtained as described below. Antibodies 
which specifically recognize the mature protein generated when the signal peptide is 
cleaved may also be obtained as described below. Similarly, antibodies which specifically 
20 recognize the signal peptides encoded by the extended cDNAs may also be obtained. 

[0025] In some embodiments, the extended cDNAs include the signal 

sequence. In other embodiments, the extended cDNAs may include the full coding 
sequence for the mature protein (i.e. the protein generated when the signal polypeptide is 
cleaved off). In addition, the extended cDNAs may include regulatory regions upstream of 

25 the translation start site or downstream of the stop codon which control the amount, 
location, or developmental stage of gene expression. As discussed above, secreted 
proteins are therapeutically important. Thus, the proteins expressed from the cDNAs may 
be useful in treating or controlling a variety of human conditions. The extended cDNAs 
may also be used to obtain the corresponding genomic DNA. The term "corresponding 

30 genomic DNA" refers to the genomic DNA which encodes mRNA which includes the 
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sequence of one of the strands of the extended cDNA in which thymidine residues in the 
sequence of the extended cDNA are replaced by uracil residues in the mRNA. 

[0026] The extended cDNAs or genomic DNAs obtained therefrom may 

be used in forensic procedures to identify individuals or in diagnostic procedures to 
5 identify individuals having genetic diseases resulting from abnormal expression of the 
genes corresponding to the extended cDNAs. In addition, the present invention is useful 
for constructing a high resolution map of the human chromosomes. 

[0027] The present invention also relates to secretion vectors capable of 

directing the secretion of a protein of interest. Such vectors may be used in gene therapy 
10 strategies in which it is desired to produce a gene product in one cell which is to be 
delivered to another location in the body. Secretion vectors may also facilitate the 
purification of desired proteins. 

[0028] The present invention also relates to expression vectors capable of 

directing the expression of an inserted gene in a desired spatial or temporal manner or at a 
15 desired level. Such vectors may include sequences upstream of the extended cDNAs such 
as promoters or upstream regulatory sequences. 

[0029] In addition, the present invention may also be used for gene therapy 

to control or treat genetic diseases. Signal peptides may also be fused to heterologous 
proteins to direct their extracellular secretion. 

20 [0030] One embodiment of the present invention is a purified or isolated 

nucleic acid comprising the sequence of one of SEQ ID NOs: 40-84 and 130-154 or a 
sequence complementary thereto. In one aspect of this embodiment, the nucleic acid is 
recombinant. 

[0031] Another embodiment of the present invention is a purified or 

25 isolated nucleic acid comprising at least 10 consecutive bases of the sequence of one of 

SEQ ID NOs: 40-84 and 130-154 or one of the sequences complementary thereto. la one 
aspect of this embodiment, the nucleic acid comprises at least 15, 25, 30, 40, 50, 75, or 100 
consecutive bases of one of the sequences of SEQ ID NOs: 40-84 and 130-154 or one of 
the sequences complementary thereto. The nucleic acid may be a recombinant nucleic 
30 acid. 
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[0032] Another embodiment of the present invention is a purified or 

isolated nucleic acid of at least 15 bases capable of hybridizing under stringent conditions 
to the sequence of one of SEQ ED NOs: 40-84 and 130-154 or a sequence complementary 
to one of the sequences of SEQ ID NOs: 40-84 and 130-154. In one aspect of this 
5 embodiment, the nucleic acid is recombinant. 

[0033] Another embodiment of the present invention is a purified or 

isolated nucleic acid comprising the full coding sequences of one of SEQ ID Nos: 40-84 
and 130-154 wherein the full coding sequence optionally comprises the sequence encoding 
signal peptide as well as the sequence encoding mature protein. In a preferred 
10 embodiment, the isolated or purified nucleic acid comprises the full coding sequence of 
one of SEQ ID Nos. 40-59, 61-73, 75, 77-82, and 130-154 wherein the full coding 
sequence comprises the sequence encoding signal peptide and the sequence encoding 
mature protein. In one aspect of this embodiment, the nucleic acid is recombinant. 

[0034] A further embodiment of the present invention is a purified or 

15 isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 40-84 and 130- 
154 which encode a mature protein. In a preferred embodiment, the purified or isolated 
nucleic acid comprises the nucleotides of one of SEQ ID NOs: 40-59, 61-75, 77-82, and 
130-154 which encode a mature protein. In one aspect of this embodiment, the nucleic 
acid is recombinant. 

20 [0035] Yet another embodiment of the present invention is a purified or 

isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 40-84 and 130- 
154 which encode the signal peptide. In a preferred embodiment, the purified or isolated 
nucleic acid comprises the nucleotides of SEQ ID NOs: 40-59, 61-73, 75-82, 84, and 130- 
1 54 which encode the signal peptide. In one aspect of this embodiment, the nucleic acid 

25 is recombinant. 

[0036] Another embodiment of the present invention is a purified or 

isolated nucleic acid encoding a polypeptide having the sequence of one of the sequences 
of SEQ ID NOs: 85-129 and 155-179. 

[0037] Another embodiment of the present invention is a purified or 

30 isolated nucleic acid encoding a polypeptide having the sequence of a mature protein 
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included in one of the sequences of SEQ ID NOs: 85-129 and 155-179. In a preferred 
embodiment, the purified or isolated nucleic acid encodes a polypeptide having the 
sequence of a mature protein included in one of the sequences of SEQ ID NOs: 85-1 04, 
106-120, 122-127, and 155-179. 

5 [0038] Another embodiment of the present invention is a purified or 

isolated nucleic acid encoding a polypeptide having the sequence of a signal peptide 
included in one of the sequences of SEQ ID NOs: 85-129 and 155-179. In a preferred 
embodiment, the purified or isolated nucleic acid encodes a polypeptide having the 
sequence of a signal peptide included in one of the sequences of SEQ ID NOs: 85-104, 
10 106-118, 120-127, 129, and 155-179. 

[0039] Yet another embodiment of the present invention is a purified or 

isolated protein comprising the sequence of one of SEQ ID NOs: 85-129 and 155- 179. 

[0040] Another embodiment of the present invention is a purified or 

isolated polypeptide comprising at least 10 consecutive amino acids of one of the 
15 sequences of SEQ ID NOs: 85-129 and 155- 179. hi one aspect of this embodiment, the 
purified or isolated polypeptide comprises at least 15, 20, 25, 35, 50, 75, 100, 150 or 200 
consecutive amino acids of one of the sequences of SEQ ID NOs: 85-129 and 155- 179. 
In still another aspect, the purified or isolated polypeptide comprises at least 25 
consecutive amino acids of one of the sequences of SEQ ID NOs: 85-129 and 155- 179. 

20 [0041] Another embodiment of the present invention is an isolated or 

purified polypeptide comprising a signal peptide of one of the polypeptides of SEQ ID 
NOs: 85-129 and 155- 179. In a preferred embodiment, the isolated or purified 
polypeptide comprises a signal peptide of one of the polypeptides of SEQ ID NOs: 85- 
104, 106-118, 120-127, 129, and 155-179. 

25 [0042] Yet another embodiment of the present invention is an isolated or 

purified polypeptide comprising a mature protein of one of the polypeptides of SEQ ID 
NOs: 85-129 and 155- 179. In a preferred embodiment, the isolated or purified 
polypeptide comprises a mature protein of one of the polypeptides of SEQ ID NOs: 85- 
104, 106-120, 122-127, and 155-179. In a preferred embodiment, the purified or isolated 
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nucleic acid encodes a polypeptide having the sequence of a mature protein included in 
one of the sequences of SEQ ID NOs: 85-104, 106-120, 122-127, and 155-179. 

[0043] A further embodiment of the present invention is a method of 

making a protein comprising one of the sequences of SEQ ID NO: 85-129 and 155-179, 
5 comprising the steps of obtaining a cDNA comprising one of the sequences of sequence of 
SEQ ID NO: 40-84 and 130-154, inserting the cDNA in an expression vector such that the 
cDNA is operably linked to a promoter, and introducing the expression vector into a host 
cell whereby the host cell produces the protein encoded by said cDNA. In one aspect of 
this embodiment, the method further comprises the step of isolating the protein. 

10 [0044] Another embodiment of the present invention is a protein 

obtainable by the method described in the preceding paragraph. 

[0045] Another embodiment of the present invention is a method of 

making a protein comprising the amino acid sequence of the mature protein contained in 
one of the sequences of SEQ ID NOs: 85-104, 106-120, 122-127, and 155-179 comprising 

15 the steps of obtaining a cDNA comprising one of the nucleotides sequence of sequence of 
SEQ ID NOs: 40-59, 61-75, 77-82, and 130-154 which encode for the mature protein, 
inserting the cDNA in an expression vector such that the cDNA is operably linked to a 
promoter, and introducing the expression vector into a host cell whereby the host cell 
produces the mature protein encoded by the cDNA. In one aspect of this embodiment, the 

20 method further comprises the step of isolating the protein. 

[0046] Another embodiment of the present invention is a mature protein 

obtainable by the method described in the preceding paragraph. 

[0047] In a preferred embodiment, the above method comprises a 

method of making a protein comprising the amino acid sequence of the mature protein 
25 contained in one of the sequences of SEQ ID NOs. 85-104, 106-120, 122-127 and 155- 
179, comprising the steps of obtaining a cDNA comprising one of the nucleotide 
sequences of SEQ ID Nos. 40-59, 61-75, 77-82 and 130-154 which encode for the 
mature protein, inserting the cDNA in an expression vector such that the cDNA is 
operably linked to a promoter, and introducing the expression vector into a host cell 
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whereby the host cell produces the mature protein encoded by the cDNA. In one aspect 
of this embodiment, the method further comprises the step of isolating the protein. 

[0048] Another embodiment of the present invention is a host cell 

containing the purified or isolated nucleic acids comprising the sequence of one of SEQ ID 
5 NOs: 40-84 and 130-154 or a sequence complementary thereto described herein. 

[0049] Another embodiment of the present invention is a host cell 

containing the purified or isolated nucleic acids comprising the full coding sequences of 
one of SEQ ID NOs: 40-59, 61-73, 75, 77-82, and 130-154, wherein the full coding 
sequence comprises the sequence encoding signal peptide and the sequence encoding 
1 0 mature protein described herein. 

[0050] Another embodiment of the present invention is a host cell 

containing the purified or isolated nucleic acids comprising the nucleotides of one of SEQ 
ID NOs: 40-84 and 130-154 which encode a mature protein which are described herein. 
Preferably, the host cell contains the purified or isolated nucleic acids comprising the 
15 nucleotides of one of SEQ ID NOs: 40-59, 61-75, 77-82, and 130-154 which encode a 
mature protein. 

[0051] Another embodiment of the present invention is a host cell 

containing the purified or isolated nucleic acids comprising the nucleotides of one of SEQ 
ID NOs: 40-84 and 130-154 which encode the signal peptide which are described herein. 
20 Preferably, the host cell contains the purified or isolated nucleic acids comprising the 
nucleotides of one of SEQ ID Nos.: 40-59, 61-73, 75-82, 84, and 130-154 which encode 
the signal peptide. 

[0052] Another embodiment of the present invention is a purified or 

isolated antibody capable of specifically binding to a protein having the sequence of one of 
25 SEQ ID NOs: 85-129 and 155-179. In one aspect of this embodiment, the antibody is 

capable of binding to a polypeptide comprising at least 10 consecutive amino acids of the 
sequence of one of SEQ ID NOs: 85-129 and 155-179. 

[0053] Another embodiment of the present invention is an array of cDNAs 

or fragments thereof of at least 15 nucleotides in length which includes at least one of the 
30 sequences of SEQ ID NOs: 40-84 and 130-154, or one of the sequences complementary to 
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the sequences of SEQ ID NOs: 40-84 and 130-154, or a fragment thereof of at least 15 
consecutive nucleotides. In one aspect of this embodiment, the array includes at least two 
of the sequences of SEQ ID NOs: 40-84 and 130-154, the sequences complementary to the 
sequences of SEQ ID NOs: 40-84 and 130-154, or fragments thereof of at least 15 
5 consecutive nucleotides. In another aspect of this embodiment, the array includes at least 
five of the sequences of SEQ ID NOs: 40-84 and 130-154, the sequences complementary 
to the sequences of SEQ ID NOs: 40-84 aud 130-154, or fragments thereof of at least 15 
consecutive nucleotides. 

[0054] A further embodiment of the invention encompasses purified 

10 polynucleotides comprising an insert from a clone deposited in a deposit having an 
accession number selected from the group consisting of the accession numbers listed in 
Table VI or a fragment thereof comprising a contiguous span of at least 8, 10, 12, 15, 20, 
25, 40, 60, 100, or 200 nucleotides of said insert. An additional embodiment of the 
invention encompasses purified polypeptides which comprise, consist of, or consist 
15 essentially of an amino acid sequence encoded by the insert from a clone deposited in a 
deposit having an accession number selected from the group consisting of the accession 
numbers listed in Table VI, as well as polypeptides which comprise a fragment of said 
amino acid sequence consisting of a signal peptide, a mature protein, or a contiguous span 
of at least 5, 8, 10, 12, 15, 20, 25, 40, 60, 100, or 200 amino acids encoded by said insert. 

20 [0055] An additional embodiment of the invention encompasses purified 

polypeptides which comprise a contiguous span of at least 5, 8, 10, 12, 15, 20, 25, 40, 60, 
100, or 200 amino acids of SEQ ID NOs: 85-129 and 155-179, wherein said contiguous 
span comprises at least one of the amino acid positions which was not shown to be 
identical to a public sequence in any of Figures 10 to 12. Also encompassed by the 

25 invention are purified polynucleotides encoding said polypeptides. 

[0056] Another embodiment of the present invention is a computer 

readable medium having stored thereon a sequence selected from the group consisting of a 
cDNA code of SEQID NOs. 40-84 and 130-154 and a polypeptide code of SEQ ID NOs. 
85-129 and 155-179. 

30 [0057] Another embodiment of the present invention is a computer system 

comprising a processor and a data storage device wherein the data storage device has 
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stored thereon a sequence selected from the group consisting of a cDNA code of SEQK) 
NOs. 40-84 and 130-154 and a polypeptide code of SEQ ID NOs. 85-129 and 155-179. In 
some embodiments the computer system further comprises a sequence comparer and a 
data storage device having reference sequences stored thereon. For example, the sequence 
5 comparer may comprise a computer program which indicates polymorphisms. In other 
aspects of the computer system, the system further comprises an identifier which identifies 
features in said sequence. 

[0058] Another embodiment of the present invention is a method for 

comparing a first sequence to a reference sequence wherein the first sequence is selected 
10 from the group consisting of a cDNA code of SEQK) NOs, 40-84 and 130-154 and a 
polypeptide code of SEQ ID NOs. 85-129 and 155-179 comprising the steps of reading the 
first sequence and the reference sequence through use of a computer program which 
compares sequences and 

determining differences between the first sequence and the reference sequence with the 
15 computer program. In some embodiments of the method, the step of determining 
differences between the first sequence and the reference sequence comprises identifying 
polymorphisms. 

[0059] Another embodiment of the present invention is a method for 

identifying a feature in a sequence selected from the group consisting of a cDNA code of 
20 SEQID NOs. 40-84 and 130-154 and apolypeptide code of SEQ ID NOs. 85-129 and 155- 
179 comprising the steps of reading the sequence through the use of a computer program 
which identifies features in sequences and identifying features in the sequence with said 
computer program. 

25 
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Brief Description of the Drawings 

[0060] Figure 1 is a summary of a procedure for obtaining cDNAs which 

have been selected to include the 5 f ends of the mRNAs from which they are derived. 

[0061] Figure 2 is an analysis of the 43 amino terminal amino acids of all 

5 human SwissProt proteins to determine the frequency of false positives and false negatives 
using the techniques for signal peptide identification described herein. 

[0062] Figure 3 shows the distribution of von Heijne scores for 5' ESTs in 

each of the categories described herein and the probability that these 5' ESTs encode a 
signal peptide. 

10 [0063] Figure 4 shows the distribution of 5' ESTs in each category and the 

number of 5' ESTs in each category having a given minimum von Heijne's score. 

[0064] Figure 5 shows the tissues from which the mRNAs corresponding 

to the 5' ESTs in each of the categories described herein were obtained. 

[0065] Figure 6 illustrates a method for obtaining extended cDNAs. 

15 [0066] Figure 7 is a map of pED6dpc2. 

[0067] Figure 8 provides a schematic description of the promoters isolated 

and the way they are assembled with the corresponding 5' tags. 

[0068] Figure 9 describes the transcription factor binding sites present in 

each of these promoters. 

20 [0069] Figure 10 is an alignment of the proteins of SEQ ID NOs: 120 

and 180 wherein the signal peptide is in italics, the predicted transmembrane segment is 
underlined, the experimentally determined transmembrane segment is double- 
underlined, and the ATP 1 G/PLMN/MAT8 signature is in bold. 

[0070] Figure 11 is an alignment of the proteins of SEQ ID NOs: 121 

25 and 181 wherein the predicted transmembrane segment is underlined. 

[0071] Figure 12 is an alignment of the proteins of SEQ ID NOs: 128 

and 182 wherein the PPPY motif is in bold. 
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Detailed Description of the Preferred Embodiment 

I. Obtaining 5' ESTs 

[0072] The present extended cDNAs were obtained using 5' ESTs which 

5 were isolated as described below. 

A. Chemical Methods for Obtaining mRNAs having Intact 5 f Ends 

[0073] In order to obtain the 5' ESTs used to obtain the extended cDNAs 

of the present invention, mRNAs having intact 5' ends must be obtained. Currently, there 
are two approaches for obtaining such mRNAs. One of these approaches is a chemical 

10 modification method involving derivatization of the 5' ends of the mRNAs and selection of 
the derivatized mRNAs. The 5' ends of eucaryotic mRNAs possess a structure referred to 
as a "cap" which comprises a guanosine methylated at the 7 position. The cap is joined to 
the first transcribed base of the mRNA by a 5', 5'-triphosphate bond. In some instances, 
the 5' guanosine is methylated in both the 2 and 7 positions. Rarely, the 5' guanosine is 

15 trimethylated at the 2, 7 and 7 positions. In the chemical method for obtaining mRNAs 
having intact 5' ends, the 5' cap is specifically derivatized and coupled to a reactive group 
on an immobilizing substrate. This specific derivatization is based on the fact that only the 
ribose linked to the methylated guanosine at the 5 r end of the mRNA and the ribose linked 
to the base at the 3' terminus of the mRNA, possess 2', 3'-cis diols. Optionally, where the 

20 3' terminal ribose has a 2', 3'-cis diol, the 2', 3'-cis diol at the 3' end may be chemically 
modified, substituted, converted, or eliminated, leaving only the ribose linked to the 
methylated guanosine at the 5' end of the mRNA with a 2', 3'-cis diol. A variety of 
techniques are available for eliminating the 2', 3'-cis diol on the J terminal ribose. For 
example, controlled alkaline hydrolysis may be used to generate mRNA fragments in 

25 which the 3' terminal ribose is a 3'-phosphate, 2 r -phosphate or (2', 3')-cyclophosphate. 
Thereafter, the fragment which includes the original 3 r ribose may be eliminated from the 
mixture through chromatography on an oligo-dT column. Alternatively, a base which 
lacks the 2', 3'-cis diol may be added to the 3 ' end of the mRNA using an RNA ligase such 
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as T4 RNA ligase. Example 1 below describes a method for ligation of pCp to the 3' end 
of messenger RNA. 

EXAMPLE 1 

5 Ligation of the Nucleoside Diphosphate pCp to the 3' End of Messenger RNA 

[0074J 1 \ig of RNA was incubated in a final reaction medium of 10 jjtl in 

the presence of 5 U of T 4 phage RNA ligase in the buffer provided by the manufacturer 
(Gibco - BRL), 40 U of the RNase inhibitor RNasin (Promega) and, 2 fil of 32 pCp 
(Amersham#PB 10208). 

10 [0075] The incubation was performed at 37°C for 2 hours or overnight at 

7-8°C 

[0076] Following modification or elimination of the 2\ 3'-cis diol at the 3' 

ribose, the 2\ 3'-cis diol present at the 5' end of the mRNA may be oxidized using reagents 
such as NaBH 4; NaBHsCN, or sodium periodate, thereby converting the 2\ 3'-cis diol to a 
15 dialdehyde. Example 2 describes the oxidation of the 2\ 3'-cis diol at the 5' end of the 
mRNA with sodium periodate. 

EXAMPLE 2 

Oxidation of 2\ 3 f -cis diol at the 5' End of the mRNA 

20 [0077] 0,1 OD unit of either a capped oligoribonucleotide of 47 

nucleotides (including the cap) or an uncapped oligoribonucleotide of 46 nucleotides were 
treated as follows. The oligoribonucleotides were produced by in vitro transcription using 
the transcription kit "AmpliScribe T7 M (Epicentre Technologies). As indicated below, the 
DNA template for the RNA transcript contained a single cytosine. To synthesize the 

25 uncapped RNA, all four NTPs were included in the in vitro transcription reaction. To 

obtain the capped RNA, GTP was replaced by an analogue of the cap, m7G(5 f )ppp(5')G. 
This compound, recognized by polymerase, was incorporated into the 5' end of the nascent 
transcript during the step of initiation of transcription but was not capable of incorporation 
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during the extension step. Consequently, the resulting RNA contained a cap at its 5' end. 
The sequences of the oligoribonucleotides produced by the in vitro transcription reaction 
were: 

+Cap: 

5 5'm7GpppGCAUCCUACUCCCAUCCAAUUCCACCCUAACUCCUCC 
CAUCUCCAC-3' (SEQIDNO:l) 

-Cap: 

5'- 

pppGCAUCCUACUCCCAUCCAAUUCCACCCUAACUCCUCCCAUC 
10 UCCAC-3' (SEQ ID NO:2) 

[0078] The oligoribonucleotides were dissolved in 9 \xl of acetate buffer 

(0,1 M sodium acetate, pH 5.2) and 3 jil of freshly prepared 0.1 M sodium periodate 
solution. The mixture was incubated for 1 hour in the dark at 4°C or room temperature. 
Thereafter, the reaction was stopped by adding 4 jal of 10% ethylene glycol The product 
15 was ethanol precipitated, resuspended in lOjil or more of water or appropriate buffer and 

dialyzed against water. 

[0079] The resulting aldehyde groups may then be coupled to molecules 

having a reactive amine group, such as hydrazine, carbazide, thiocarbazide or 
semicarbazide groups, in order to facilitate enrichment of the 5' ends of the mRNAs. 
20 Molecules having reactive amine groups which are suitable for use in selecting mRNAs 
having intact 5' ends include avidin, proteins, antibodies, vitamins, ligands capable of 
specifically binding to receptor molecules, or oligonucleotides. Example 3 below 
describes the coupling of the resulting dialdehyde to biotin. 
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EXAMPLE 3 

Coupling of the Dialdehyde with Biotin 

[0080] The oxidation product obtained in Example 2 was dissolved in 50 

\x\ of sodium acetate at a pH of between 5 and 5.2 and 50 jul of freshly prepared 0.02 M 
solution of biotin hydrazide in a methoxyethanol/water mixture (1 : 1) of fonmila: 



O 
II 



NH 2 NH ~C— (CH 2 )n — NH — C~ (CH 2 ) 4 - 




[0081] In the compound used in these experiments, n=5. However, it will 

be appreciated that other commercially available hydrazides may also be used, such as 
molecules of the formula above in which n varies from 0 to 5. 

[0082] The mixture was then incubated for 2 hours at 37°C. Following the 

incubation, the mixture was precipitated with ethanol and dialyzed against distilled water. 

[0083] Example 4 demonstrates the specificity of the biotinylation 

reaction. 



EXAMPLE 4 

S pecificity of Biotinylation 

[0084] The specificity of the biotinylation for capped mRNAs was 

evaluated by gel electrophoresis of the following samples: 

[0085] Sample 1 . The 46 nucleotide uncapped in vitro transcript prepared 

as in Example 2 and labeled with 32 pCp as described in Example L 
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[0086] Sample 2 . The 46 nucleotide uncapped in vitro transcript prepared 

as in Example 2, labeled with 32 pCp as described in Example 1, treated with the oxidation 
reaction of Example 2, and subjected to the biotinylation conditions of Example 3. 

[0087] Sample 3 . The 47 nucleotide capped in vitro transcript prepared as 

5 in Example 2 and labeled with 32 pCp as described in Example 1 . 

[0088] Sample 4 . The 47 nucleotide capped in vitro transcript prepared as 

in Example 2, labeled with 32 pCp as described in Example 1, treated with the oxidation 
reaction of Example 2, and subjected to the biotinylation conditions of Example 3. 

[0089] Samples 1 and 2 had identical migration rates, demonstrating that 

10 the uncapped RNAs were not oxidized and biotinylated. Sample 3 migrated more slowly 
than Samples 1 and 2, while Sample 4 exhibited the slowest migration. The difference in 
migration of the RNAs in Samples 3 and 4 demonstrates that the capped RNAs were 
specifically biotinylated. 

[0090] In some cases, mRNAs having intact 5' ends may be enriched by 

15 binding the molecule containing a reactive amine group to a suitable solid phase substrate 
such as the inside of the vessel containing the mRNAs, magnetic beads, chromatography 
matrices, or nylon or nitrocellulose membranes. For example, where the molecule having 
a reactive amine group is biotin, the solid phase substrate may be coupled to avidin or 
streptavidin. Alternatively, where the molecule having the reactive amine group is an 
20 antibody or receptor ligand, the solid phase substrate may be coupled to the cognate 
antigen or receptor. Finally, where the molecule having a reactive amine group comprises 
an oligonucleotide, the solid phase substrate may comprise a complementary 
oligonucleotide. 

[0091] The mRNAs having intact 5' ends may be released from the solid 

25 phase following the enrichment procedure. For example, where the dialdehyde is coupled 

to biotin hydrazide and the solid phase comprises streptavidin, the mRNAs may be 
released from the solid phase by simply heating to 95 degrees Celsius in 2% SDS. In some 
methods, the molecule having a reactive amine group may also be cleaved from the 
mRNAs having intact 5' ends following enrichment. Example 5 describes the capture of 
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biotinylated mRNAs with streptavidin coated beads and the release of the biotinylated 
mRNAs from the beads following enrichment. 

EXAMPLE 5 

5 Capture and Release of Biotinylated mRNAs Using Strepatividin Coated Beads 

[0092] The streptavidin-coated magnetic beads were prepared according to 

the manufacturer's instructions (CPG Inc., USA). The biotinylated mRNAs were added to 
a hybridization buffer (1.5 M NaCl, pH 5 - 6). After incubating for 30 minutes, the 
unbound and nonbiotinylated material was removed. The beads were washed several times 
10 in water with 1% SDS. The beads obtained were incubated for 15 minutes at 95°C in water 
containing 2% SDS. 

[0093] Example 6 demonstrates the efficiency with which biotinylated 

mRNAs were recovered from the streptavidin coated beads. 

15 EXAMPLE 6 

Efficiency of Recovery of Biotinylated mRNAs 

[0094] The efficiency of the recovery procedure was evaluated as follows. 

RNAs were labeled with 32 pCp, oxidized, biotinylated and bound to streptavidin coated 
beads as described above. Subsequently, the bound RNAs were incubated for 5, 15 or 30 
20 minutes at 95°C in the presence of 2% SDS. 

[0095] The products of the reaction were analyzed by electrophoresis on 

12% polyacrylamide gels under denaturing conditions (7 M urea). The gels were subjected 
to autoradiography. During this manipulation, the hydrazone bonds were not reduced. 

[0096] Increasing amounts of nucleic acids were recovered as incubation 

25 times in 2% SDS increased, demonstrating that biotinylated mRNAs were efficiently 
recovered. 

[0097] In an alternative method for obtaining mRNAs having intact 5' 

ends, an oligonucleotide which has been derivatized to contain a reactive amine group is 



-22- 



36US3.DIV 

specifically coupled to mRNAs having an intact cap. Preferably, the 3' end of the mRNA 
is blocked prior to the step in which the aldehyde groups are joined to the derivatized 
oligonucleotide, as described above, so as to prevent the derivatized oligonucleotide from 
being joined to the 3 f end of the mRNA. For example, pCp may be attached to the 3' end 
5 of the mRNA using T4 RNA ligase. However, as discussed above, blocking the 3' end of 
the mRNA is an optional step. Derivatized oligonucleotides may be prepared as described 
below in Example 7. 

EXAMPLE 7 

10 Derivatization of the Oligonucleotide 

[0098] An oligonucleotide phosphorylated at its 3' end was converted to a 

3' hydrazide in 3' by treatment with an aqueous solution of hydrazine or of dihydrazide of 
the formula H 2 N(R1)NH 2 at about 1 to 3 M, and at pH 4.5, in the presence of a 
carbodiimide type agent soluble in water such as l-ethyl-3-(3-dimethylaminopropyl)- 
15 carbodiimide at a final concentration of 0.3 M at a temperature of 8°C overnight. 

[0099] The derivatized oligonucleotide was then separated from the other 

agents and products using a standard technique for isolating oligonucleotides. 

[0100] As discussed above, the mRNAs to be enriched may be treated to 

eliminate the 3' OH groups which may be present thereon. This may be accomplished by 
20 enzymatic ligation of sequences lacking a 3' OH, such as pCp, as described above in 
Example L Alternatively, the 3' OH groups may be eliminated by alkaline hydrolysis as 
described in Example 8 below. 
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EXAMPLE 8 

Alkaline Hydrolysis o f mRNA 

[0101] The mRNAs may be treated with alkaline hydrolysis as follows. In 

a total volume of lOOfal of 0.1N sodium hydroxide, 1.5p,g mRNA is incubated for 40 to 60 
5 minutes at 4°C. The solution is neutralized with acetic acid and precipitated with ethanol. 

[0102] Following the optional elimination of the 3' OH groups, the diol 

groups at the 5' ends of the mRNAs are oxidized as described below in Example 9. 

EXAMPLE 9 

10 Oxidation of Dio Is 

[0103] Up to 1 OD unit of RNA was dissolved in 9 \d of buffer (0.1 M 

sodium acetate, pH 6-7 or water) and 3 (il of freshly prepared 0.1 M sodium periodate 
solution. The reaction was incubated for 1 h in the dark at 4°C or room temperature. 
Following the incubation, the reaction was stopped by adding 4 fj,l of 10% ethylene glycol 
15 Thereafter the mixture was incubated at room temperature for 15 minutes. After ethanol 
precipitation, the product was resuspended in 10(il or more of water or appropriate buffer 
and dialyzed against water. 

[0104] Following oxidation of the diol groups at the 5' ends of the 

mRNAs, the derivatized oligonucleotide was joined to the resulting aldehydes as described 
20 in Example 10. 

EXAMPLE 10 

Reaction of Aldehydes with Derivatized Oligonucleotides 

[0105] The oxidized mRNA was dissolved in an acidic medium such as 50 

25 jal of sodium acetate pH 4-6. 50 jil of a solution of the derivatized oligonucleotide was 

added such that an mRNAiderivatized oligonucleotide ratio of 1:20 was obtained and 
mixture was reduced with a borohydride. The mixture was allowed to incubate for 2 h at 
37°C or overnight (14 h) at 10°C The mixture was ethanol precipitated, resuspended in 
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lOjal or more of water or appropriate buffer and dialyzed against distilled water. If desired, 
the resulting product may be analyzed using acrylamide gel electrophoresis, HPLC 
analysis, or other conventional techniques. 

[0106] Following the attachment of the derivatized oligonucleotide to the 

5 mRNAs, a reverse transcription reaction may be performed as described in Example 11 
below. 

EXAMPLE 11 

Reverse Transcription of mRNAs 

10 [0107] An oligodeoxyribonucleotide was derivatized as follows. 3 OD 

units of an oligodeoxyribonucleotide of sequence 

ATCAAGAATTCGCACGAGACCATTA (SEQ ID NO:3) having 5'-OH and 3'-P ends 
were dissolved in 70 jal of a L5 M hydroxybenzotriazole solution, pH 5.3, prepared in 
dimethylformamide/water (75:25) containing 2 jug of l-ethyl-3-(3- 

15 dimethylaminopropyl)carbodiimide. The mixture was incubated for 2 h 30 min at 22°C. 
The mixture was then precipitated twice in LiCKVacetone, The pellet was resuspended in 
200 jlxI of 0.25 M hydrazine and incubated at 8°C from 3 to 14 h. Following the hydrazine 
reaction, the mixture was precipitated twice in LiClCVacetone. 

[0108] The messenger RNAs to be reverse transcribed were extracted from 

20 blocks of placenta having sides of 2 cm which had been stored at -80°C. The mRNA was 
extracted using conventional acidic phenol techniques. Oligo-dT chromatography was 
used to purify the mRNAs. The integrity of the mRNAs was checked by Northern-blotting. 

[0109] The diol groups on 7 jig of the placental mRNAs were oxidized as 

described above in Example 9. The derivatized oligonucleotide was joined to the mRNAs 
25 as described in Example 10 above except that the precipitation step was replaced by an 

exclusion chromatography step to remove derivatized oligodeoxyribonucleotides which 
were not joined to mRNAs. Exclusion chromatography was performed as follows: 

[0110] 10 ml of AcA34 (BioSepra#230151) gel were equilibrated in 50 ml 

of a solution of 10 mM Tris pH 8.0, 300 mM NaCl, 1 mM EDTA, and 0.05% SDS. The 
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mixture was allowed to sediment. The supernatant was eliminated and the gel was 
resuspended in 50 ml of buffer. This procedure was repeated 2 or 3 times. 

[0111] A glass bead (diameter 3 mm) was introduced into a 2 ml 

disposable pipette (length 25 cm). The pipette was filled with the gel suspension until the 
5 height of the gel stabilized at 1 cm from the top of the pipette. The column was then 
equilibrated with 20 ml of equilibration buffer (10 mM Tris HC1 pH 7.4, 20 mM NaCl). 

[0112] 10 \xl of the mRNA which had been reacted with the derivatized 

oligonucleotide were mixed in 39 |il of 10 mM urea and 2 |^1 of blue-glycerol buffer, 
which had been prepared by dissolving 5 mg of bromophenol blue in 60% glycerol (v/v), 
10 and passing the mixture through a filter with a filter of diameter 0.45 ^im. 

[0113] The column was loaded. As soon as the sample had penetrated, 

equilibration buffer was added. 100 (il fractions were collected. Derivatized 
oligonucleotide which had not been attached to mRNA appeared in fraction 16 and later 
fractions. Fractions 3 to 15 were combined and precipitated with ethanol. 

15 [0114] The mRNAs which had been reacted with the derivatized 

oligonucleotide were spotted on a nylon membrane and hybridized to a radioactive probe 
using conventional techniques. The radioactive probe used in these hybridizations was an 
oligodeoxyribonucleotide of sequence TAATGGTCTCGTGCGAATTCTTGAT (SEQ ID 
NO:4) which was anticomplementary to the derivatized oligonucleotide and was labeled at 

20 its 5' end with 32 P. 1/1 0th of the mRNAs which had been reacted with the derivatized 
oligonucleotide was spotted in two spots on the membrane and the membrane was 
visualized by autoradiography after hybridization of the probe. A signal was observed, 
indicating that the derivatized oligonucleotide had been joined to the mRNA. 

[01 1 5] The remaining 9/1 0 of the mRNAs which had been reacted with the 

25 derivatized oligonucleotide was reverse transcribed as follows. A reverse transcription 
reaction was carried out with reverse transcriptase following the manufacturer's 
instructions. To prime the reaction, 50 pmol of nonamers with random sequence were 
used. 

[0116] A portion of the resulting cDNA was spotted on a positively 

30 charged nylon membrane using conventional methods. The cDNAs were spotted on the 
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membrane after the cDNA:RNA heteroduplexes had been subjected to an alkaline 
hydrolysis in order to eliminate the RNAs. An oligonucleotide having a sequence identical 
to that of the derivatized oligonucleotide was labeled at its 5' end with 32 P and hybridized 
to the cDNA blots using conventional techniques. Single-stranded cDNAs resulting from 
5 the reverse transcription reaction were spotted on the membrane. As controls, the blot 
contained 1 pmol, 100 finol, 50 finol, 10 finol and 1 finol respectively of a control 
oligodeoxyribonucleotide of sequence identical to that of the derivatized oligonucleotide. 
The signal observed in the spots containing the cDNA indicated that approximately 15 
finol of the derivatized oligonucleotide had been reverse transcribed. 

10 [01171 These results demonstrate that the reverse transcription can be 

performed through the cap and, in particular, that reverse transcriptase crosses the 
5'-P-P-P-5' bond of the cap of eukaryotic messenger RNAs. 

[0118] The single stranded cDNAs obtained after the above first strand 

synthesis were used as template for PCR reactions. Two types of reactions were carried 
15 out. First, specific amplification of the mRNAs for the alpha globin, dehydrogenase, ppl5 
and elongation factor E4 were carried out using the following pairs of 
oligodeoxyribonucleotide primers. 

alpha-globin 

GLO-S: CCG ACA AGA CCA ACG TCA AGG CCG C (SEQ ID NO:5) 
20 GLO-As: TCA CCA GCA GGC AGT GGC TTA GGA G 3 f (SEQ ID NO:6) 

dehydrogenase 

3 DH-S: AGT GAT TCC TGC TAC TTT GGA TGG C (SEQ ID NO:7) 
3 DH-As: GCT TGG TCT TGT TCT GGA GTT TAG A (SEQ ID NO:8) 
ppl5 

25 PP1 5-S: TCC AGA ATG GGA GAC AAG CCA ATT T (SEQ ID NO:9) 

PP15-As: AGG GAG GAG GAA ACA GCG TGA GTC C (SEQ ID NO:10) 

Elongation factor E4 
EFA1-S: ATG GGA AAG GAA AAG ACT CAT ATC A (SEQ ID NO:l 1) 
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EFIA-As: AGC AGC AAC AAT CAG GAC AGC ACA G (SEQ ID N0:12) 

[0119] Non specific amplifications were also carried out with the antisense 

(_As) oligodeoxyribonucleotides of the pairs described above and a primer 
chosen from the sequence of the derivatized oligodeoxyribonucleotide 
5 (ATCAAGAATTCGCACGAGACCATTA) (SEQ ID NO:13). 

[0120] A 1.5% agarose gel containing the following samples 

corresponding to the PCR products of reverse transcription was stained with ethidium 
bromide. (l/20th of the products of reverse transcription were used for each PCR reaction). 

[0121] Sample 1 : The products of a PCR reaction using the globin primers 

10 of SEQ ID NOs 5 and 6 in the presence of cDNA. 

[0122] Sample 2 : The products of a PCR reaction using the globin primers 

of SEQ ID NOs 5 and 6 in the absence of added cDNA. 

[0123] Sample 3 : The products of a PCR reaction using the dehydrogenase 

primers of SEQ ID NOs 7 and 8 in the presence of cDNA. 

15 [0124] Sample 4 : The products of a PCR reaction using the 

dehydrogenase primers of SEQ ID NOs 7 and 8 in the absence of added cDNA, 

[0125] Sample 5 : The products of a PCR reaction using the ppl5 primers 

of SEQ ID NOs 9 and 10 in the presence of cDNA. 

[0126] Sample 6 : The products of a PCR reaction using the ppl5 primers 

20 of SEQ ID NOs 9 and 1 0 in the absence of added cDNA. 

[0127] Sample 7 : The products of a PCR reaction using the EEE4 primers 

of SEQ ID NOs 1 1 and 12 in the presence of added cDNA. 

[0128] Sample 8 : The products of a PCR reaction using the EIE4 primers 

of SEQ ID NOs 1 1 and 12 in the absence of added cDNA. 

25 [0129] In Samples 1 ? 3, 5 and 7, a band of the size expected for the PCR 

product was observed, indicating the presence of the corresponding sequence in the cDNA 
population. 
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[0130] PCR reactions were also carried out with the antisense 

oligonucleotides of the globin and dehydrogenase primers (SEQ ID NOs 6 and 8) and an 
oligonucleotide whose sequence corresponds to that of the derivatized oligonucleotide. 
The presence of PCR products of the expected size in the samples corresponding to 
5 samples 1 and 3 above indicated that the derivatized oligonucleotide had been 
incorporated. 

[0131] The above examples summarize the chemical procedure for 

enriching mRNAs for those having intact 5' ends. Further detail regarding the chemical 
approaches for obtaining mRNAs having intact 5' ends are disclosed in International 

10 Application No. W096/34981, published November 7, 1996, which is incorporated herein 

by reference. Strategies based on the above chemical modifications to the 5' cap 

structure may be utilized to generate cDNAs which have been selected to include the 5' 
ends of the mRNAs from which they are derived. In one version of such procedures, the 5' 
ends of the mRNAs are modified as described above. Thereafter, a reverse transcription 

15 reaction is conducted to extend a primer complementary to the mRNA to the 5' end of the 
mRNA. Single stranded RNAs are eliminated to obtain a population of cDNA/mRNA 
heteroduplexes in which the mRNA includes an intact 5' end. The resulting 
heteroduplexes may be captured on a solid phase coated with a molecule capable of 
interacting with the molecule used to derivatize the 5' end of the mRNA. Thereafter, the 

20 strands of the heteroduplexes are separated to recover single stranded first cDNA strands 
which include the 5' end of the mRNA. Second strand cDNA synthesis may then proceed 
using conventional techniques. For example, the procedures disclosed in WO 96/34981 or 
in Caminci, P. et al. High-Efficiency Full-Length cDNA Cloning by Biotinylated CAP 
Trapper. Genomics 37:327-336 (1996), the disclosures of which are incoiporated herein 

25 by reference, may be employed to select cDNAs which include the sequence derived from 
the 5' end of the coding sequence of the mRNA. 

[0132] Following ligation of the oligonucleotide tag to the 5' cap of the 

mRNA, a reverse transcription reaction is conducted to extend a primer complementary to 
the mRNA to the 5' end of the mRNA. Following elimination of the RNA component of 
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the resulting heteroduplex using standard techniques, second strand cDNA synthesis is 
conducted with a primer complementary to the oligonucleotide tag. 

[0133] Figure 1 summarizes the above procedures for obtaining cDNAs 

which have been selected to include the 5' ends of the mRNAs from which they are 
5 derived. 

B. Enzymatic Methods for Obtaining mRNAs having Intact 5' Ends 

[0134] Other techniques for selecting cDNAs extending to the 5' end of the 

mRNA from which they are derived are fully enzymatic. Some versions of these 
10 techniques are disclosed in Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI 
University, Le clonage des ADNc complets: difficultes et perspectives nouvelles. Apports 
pour Tetude de la regulation de Texpression de la tryptophane hydroxylase de rat, 20 Dec. 
1993), EPO 625572 and Kato et al. Construction of a Human Full-Length cDNA Bank. 
Gene 150:243-250 (1994), the disclosures of which are incorporated herein by reference. 

15 [0135] Briefly, in such approaches, isolated mRNA is treated with alkaline 

phosphatase to remove the phosphate groups present on the 5' ends of uncapped 
incomplete mRNAs. Following this procedure, the cap present on full length mRNAs is 
enzymatically removed with a decapping enzyme such as T4 polynucleotide kinase or 
tobacco acid pyrophosphatase. An oligonucleotide, which may be either a DNA 

20 oligonucleotide or a DNA-RNA hybrid oligonucleotide having RNA at its 3' end, is then 

ligated to the phosphate present at the 5' end of the decapped mRNA using T4 RNA ligase. 
The oligonucleotide may include a restriction site to facilitate cloning of the cDNAs 
following their synthesis. Example 12 below describes one enzymatic method based on 
the doctoral thesis of Dumas. 

25 

EXAMPLE 12 

Enzymatic Approach for Obtaining 5 r ESTs 

[0136] Twenty micrograms of PolyA+ RNA were dephosphorylated using 

Calf Intestinal Phosphatase (Biolabs). After a phenol chloroform extraction, the cap 
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structure of mRNA was hydrolyzed using the Tobacco Acid Pyrophosphatase (purified as 
described by Shinshi et al, Biochemistry 15: 2185-2190, 1976) and a hemi 5'DNA/RNA- 
3' oligonucleotide having an unphosphorylated 5' end, a stretch of adenosine 
ribophosphate at the 3' end, and an EcoRI site near the 5' end was ligated to the 5'P ends of 
5 mRNA using the T4 RNA ligase (Biolabs). Oligonucleotides suitable for use in this 
procedure are preferably 30-50 bases in length. Oligonucleotides having an 
unphosphorylated 5' end may be synthesized by adding a fluorochrome at the 5' end. The 
inclusion of a stretch of adenosine ribophosphates at the 3 r end of the oligonucleotide 
increases ligation efficiency. It will be appreciated that the oligonucleotide may contain 
1 0 cloning sites other than EcoRI. 

[0137] Following ligation of the oligonucleotide to the phosphate present 

at the 5' end of the decapped mRNA, first and second strand cDNA synthesis may be 
carried out using conventional methods or those specified in EPO 625,572 and Kato et al. 
Construction of a Human Full-Length cDNA Bank. Gene 150:243-250 (1994), and Dumas 

15 Milne Edwards, supra, the disclosures of which are incorporated herein by reference. The 
resulting cDNA may then be ligated into vectors such as those disclosed in Kato et al. 
Construction of a Human Full-Length cDNA Bank. Gene 150:243-250 (1994) or other 
nucleic acid vectors known to those skilled in the art using techniques such as those 
described in Sambrook et al, Molecular Cloning: A Laboratory Manual 2d Ed., Cold 

20 Spring Harbor Laboratory Press, 1989, the disclosure of which is incorporated herein by 
reference. 

II. Characterization of 5'ESTs 

[0138] The above chemical and enzymatic approaches for enriching 

25 mRNAs having intact 5' ends were employed to obtain 5' ESTs. First, mRNAs were 
prepared as described in Example 13 below. 
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EXAMPLE 13 

Preparation of mRNA 

[0139] Total human RNAs or PolyA+ RNAs derived from 29 different 

tissues were respectively purchased from LABIMO and CLONTECH and used to generate 
5 44 cDNA libraries as described below. The purchased RNA had been isolated from cells 
or tissues using acid guanidium thiocyanate-phenol-chloroform extraction (Chomczyniski, 
P and Sacchi, N, Analytical Biochemistry 162:156-159, 1987). PolyAf RNA was 
isolated from total RNA (LABIMO) by two passes of oligodT chromatography, as 
described by Aviv and Leder (Aviv, H, and Leder, P., Proc. Natl. Acad. Sci. USA 
10 69:1408-1412, 1972) in order to eliminate ribosomal RNA. 

[0140] The quality and the integrity of the poly A+ were checked. 

Northern blots hybridized with a globin probe were used to confirm that the mRNAs were 
not degraded. Contamination of the PolyA+ mRNAs by ribosomal sequences was 
checked using RNAs blots and a probe derived from the sequence of the 28S RNA. 
15 Preparations of mRNAs with less than 5% of ribosomal RNAs were used in library 
construction. To avoid constructing libraries with RNAs contaminated by exogenous 
sequences (prokaryotic or fungal), the presence of bacterial 16S ribosomal sequences or of 
two highly expressed mRNAs was examined using PCR. 

[0141] Following preparation of the mRNAs, the above described 

20 chemical and/or the enzymatic procedures for enriching mRNAs having intact 5' ends 
discussed above were employed to obtain 5' ESTs from various tissues. In both approaches 
an oligonucleotide tag was attached to the cap at the 5' ends of the mRNAs. The 
oligonucleotide tag had an EcoRI site therein to facilitate later cloning procedures. 

[0142] Following attachment of the oligonucleotide tag to the mRNA by 

25 either the chemical or enzymatic methods, the integrity of the mRNA was examined by 
performing a Northern blot with 200-500ng of mRNA using a probe complementary to the 
oligonucleotide tag. 
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EXAMPLE 14 

cDNA Synthesis Using mRNA Templates Having Intact 5' Ends 

[0143] For the mRNAs joined to oligonucleotide tags using both the 

chemical and enzymatic methods, first strand cDNA synthesis was performed using 
5 reverse transcriptase with random nonamers as primers. In order to protect internal EcoRI 
sites in the cDNA from digestion at later steps in the procedure, methylated dCTP was 
used for first strand synthesis. After removal of RNA by an alkaline hydrolysis, the first 
strand of cDNA was precipitated using isopropanol in order to eliminate residual primers. 

[0144] For both the chemical and the enzymatic methods, the second 

10 strand of the cDNA was synthesized with a Klenow fragment using a primer 
corresponding to the 5 'end of the ligated oligonucleotide described in Example 12. 
Preferably, the primer is 20-25 bases in length. Methylated dCTP was also used for 
second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion 
during the cloning process. 

15 [0145] Following cDNA synthesis, the cDNAs were cloned into 

pBlueScript as described in Example 15 below. 
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EXAMPLE 15 

Insertion of cDNAs into BlueScript 

[0146] Following second strand synthesis, the ends of the cDNA were 

blunted with T4 DNA polymerase (Biolabs) and the cDNA was digested with EcoRL 
5 Since methylated dCTP was used during cDNA synthesis, the EcoRI site present in the tag 
was the only site which was hemi-methylated. Consequently, only the EcoRI site in the 
oligonucleotide tag was susceptible to EcoRI digestion. The cDNA was then size 
fractionated using exclusion chromatography (AcA, Biosepra). Fractions corresponding to 
cDNAs of more than 150 bp were pooled and ethanol precipitated. The cDNA was 
10 directionally cloned into the Smal and EcoRI ends of the phagemid pBlueScript vector 
(Stratagene). The ligation mixture was electroporated into bacteria and propagated under 
appropriate antibiotic selection. 

[0147] Clones containing the oligonucleotide tag attached were selected as 

described in Example 16 below. 

15 

EXAMPLE 16 

Selection of Clones Having the Oligonucleotide Tag Attached Thereto 

[0148] The plasmid DNAs containing 5' EST libraries made as described 

above were purified (Qiagen). A positive selection of the tagged clones was performed as 

20 follows. Briefly, in this selection procedure, the plasmid DNA was converted to single 

stranded DNA using gene II endonuclease of the phage Fl in combination with an 
exonuclease (Chang et al. ? Gene 127:95-8, 1993) such as exonuclease III or T7 gene 6 
exonuclease. The resulting single stranded DNA was then purified using paramagnetic 
beads as described by Fry et al., Biotechniques> 13: 124-131, 1992. In this procedure, the 

25 single stranded DNA was hybridized with a biotinylated oligonucleotide having a 

sequence corresponding to the 3' end of the oligonucleotide described in Example 13. 
Preferably, the primer has a length of 20-25 bases. Clones including a sequence 
complementary to the biotinylated oligonucleotide were captured by incubation with 
streptavidin coated magnetic beads followed by magnetic selection. After capture of the 

30 positive clones, the plasmid DNA was released from the magnetic beads and converted 
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into double stranded DNA using a DNA polymerase such as the ThermoSequenase 
obtained from Amersham Pharmacia Biotech. Alternatively, protocols such as the Gene 
Trapper kit (Gibco BRL) may be used. The double stranded DNA was then electroporated 
into bacteria. The percentage of positive clones having the 5' tag oligonucleotide was 
5 estimated to typically rank between 90 and 98% using dot blot analysis. 

[0149] Following electroporation, the libraries were ordered in 384- 

microtiter plates (MTP). A copy of the MTP was stored for future needs. Then the 
libraries were transferred into 96 MTP and sequenced as described below. 

10 EXAMPLE 17 

SequencinR of Inserts in Selected Clones 

[0150] Plasmid inserts were first amplified by PCR on PE 9600 

thermocyclers (Perkin-Elmer), using standard SETA-A and SETA-B primers (Genset SA) ? 
AmpliTaqGold (Perkin-Elmer), dNTPs (Boehringer), buffer and cycling conditions as 
1 5 recommended by the Perkin-Elmer Corporation. 

[0151] PCR products were then sequenced using automatic ABI Prism 377 

sequencers (Perkin Elmer, Applied Biosystems Division, Foster City, CA). Sequencing 
reactions were performed using PE 9600 thermocyclers (Perkin Elmer) with standard dye- 
primer chemistry and ThermoSequenase (Amersham Life Science). The primers used 
20 were either T7 or 21M13 (available from Genset SA) as appropriate. The primers were 
labeled with the JOE, FAM, ROX and TAMRA dyes. The dNTPs and ddNTPs used in the 
sequencing reactions were purchased from Boehringer. Sequencing buffer, reagent 
concentrations and cycling conditions were as recommended by Amersham. 

[0152] Following the sequencing reaction, the samples were precipitated 

25 with EtOH, resuspended in formamide loading buffer, and loaded on a standard 4% 
acrylamide gel. Electrophoresis was performed for 2.5 hours at 3000V on an ABI 377 
sequencer, and the sequence data were collected and analyzed using the ABI Prism DNA 
Sequencing Analysis Software, version 2.1.2. 
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[0153] The sequence data from the 44 cDNA libraries made as described 

above were transferred to a proprietary database, where quality control and validation steps 
were performed. A proprietary base-caller ("Trace"), working using a Unix system 
automatically flagged suspect peaks, taking into account the shape of the peaks, the inter- 
5 peak resolution, and the noise level The proprietary base-caller also performed an 
automatic trimming. Any stretch of 25 or fewer bases having more than 4 suspect peaks 
was considered unreliable and was discarded. Sequences corresponding to cloning vector 
or ligation oligonucleotides were automatically removed from the EST sequences. 
However, the resulting EST sequences may contain 1 to 5 bases belonging to the above 
10 mentioned sequences at their 5' end. If needed, these can easily be removed on a case by 
case basis. 

[0154] Thereafter, the sequences were transferred to the proprietary 

NETGENE™ Database for further analysis as described below. 

[0155] Following sequencing as described above, the sequences of the 5' 

15 ESTs were entered in a proprietary database called NETGENE™ for storage and 

manipulation. It will be appreciated by those skilled in the art that the data could be stored 
and manipulated on any medium which can be read and accessed by a computer. 
Computer readable media include magnetically readable media, optically readable media, 
or electronically readable media. For example, the computer readable media may be a 
20 hard disc, a floppy disc, a magnetic tape, CD-ROM, RAM, or ROM as well as other types 
of other media known to those skilled in the art. 

[0156] In addition, the sequence data may be stored and manipulated in a 

variety of data processor programs in a variety of formats. For example, the sequence data 
may be stored as text in a word processing file, such as MicrosoftWORD or 
25 WORDPERFECT or as an ASCII file in a variety of database programs familiar to those 
of skill in the art, such as DB2, SYBASE, or ORACLE. 

[0157] The computer readable media on which the sequence information is 

stored may be in a personal computer, a network, a server or other computer systems 
known to those skilled in the art. The computer or other system preferably includes the 
30 storage media described above, and a processor for accessing and manipulating the 
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sequence data. Once the sequence data has been stored it may be manipulated and 
searched to locate those stored sequences which contain a desired nucleic acid sequence or 
which encode a protein having a particular functional domain. For example, the stored 
sequence information may be compared to other known sequences to identify homologies, 
5 motifs implicated in biological function, or structural motifs. 

[0158] Programs which may be used to search or compare the stored 

sequences include the MacPattern (EMBL), BLAST, and BLAST2 program series 
(NCBI), basic local alignment search tool programs for nucleotide (BLASTN) and peptide 
(BLASTX) comparisons (Altschul et al, J. Mol. BioL 215: 403 (1990)) and FASTA 
10 (Pearson and Lipman, Proc. Natl. Acad. ScL USA, 85: 2444 (1988)). The BLAST 
programs then extend the alignments on the basis of defined match and mismatch criteria. 

[0159] Motifs which may be detected using the above programs include 

sequences encoding leucine zippers, helix-turn-helix motifs, glycosylation sites, 
ubiquitination sites, alpha helices, and beta sheets, signal sequences encoding signal 
15 peptides which direct the secretion of the encoded proteins, sequences implicated in 
transcription regulation such as homeoboxes, acidic stretches, enzymatic active sites, 
substrate binding sites, and enzymatic cleavage sites. 

[0160] Before searching the cDNAs in the NETGENE™ database for 

sequence motifs of interest, cDNAs derived from mRNAs which were not of interest were 
20 identified and eliminated from further consideration as described in Example 1 8 below. 

EXAMPLE 18 

Elimination of Undesired Sequences from Further Consideration 

[0161] 5' ESTs in the NETGENE™ database which were derived from 

25 undesired sequences such as transfer RNAs, ribosomal RNAs, mitochondrial RNAs, 

procaryotic RNAs, fungal RNAs, Alu sequences, LI sequences, or repeat sequences were 
identified using the FASTA and BLASTN programs with the parameters listed in Table II. 

[0162] To eliminate 5' ESTs encoding tRNAs from further consideration, 

the 5' EST sequences were compared to the sequences of 1190 known tRNAs obtained 
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from EMBL release 38, of which 100 were human. The comparison was performed using 
FASTA on both strands of the 5 r ESTs. Sequences having more than 80% homology over 
more than 60 nucleotides were identified as tRNA. Of the 144,341 sequences screened, 26 
were identified as tRNAs and eliminated from further consideration. 

5 [0163] To eliminate 5' ESTs encoding rRNAs from further consideration, 

the 5' EST sequences were compared to the sequences of 2497 known rRNAs obtained 
from EMBL release 38, of which 73 were human. The comparison was performed using 
BLASTN on both strands of the 5' ESTs with the parameter S-108. Sequences having 
more than 80% homology over stretches longer than 40 nucleotides were identified as 
10 rRNAs. Of the 144,341 sequences screened, 3,312 were identified as rRNAs and 
eliminated from further consideration. 

[0164] To eliminate 5' ESTs encoding mtRNAs from further 

consideration, the 5' EST sequences were compared to the sequences of the two known 
mitochondrial genomes for which the entire genomic sequences are available and all 

15 sequences transcribed from these mitochondrial genomes including tRNAs, rRNAs, and 

mRNAs for a total of 38 sequences. The comparison was performed using BLASTN on 
both strands of the 5' ESTs with the parameter S=108. Sequences having more than 80% 
homology over stretches longer than 40 nucleotides were identified as mtRNAs. Of the 
144,341 sequences screened, 6,110 were identified as mtRNAs and eliminated from 

20 further consideration. 

[0165] Sequences which might have resulted from exogenous 

contaminants were eliminated from further consideration by comparing the 5' EST 
sequences to release 46 of the EMBL bacterial and fungal divisions using BLASTN with 
the parameter S=144 All sequences having more than 90% homology over at least 40 

25 nucleotides were identified as exogenous contaminants. Of the 42 cDNA libraries 

examined, the average percentages of procaryotic and fungal sequences contained therein 
were 0.2% and 0.5% respectively. Among these sequences, only one could be identified 
as a sequence specific to fungi. The others were either fungal or procaryotic sequences 
having homologies with vertebrate sequences or including repeat sequences which had not 

30 been masked during the electronic comparison. 
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[0166] In addition, the 5' ESTs were compared to 6093 Alu sequences and 

1115 LI sequences to mask 5' ESTs containing such repeat sequences from further 
consideration. 5' ESTs including THE and MER repeats, SSTR sequences or satellite, 
micro-satellite, or telomeric repeats were also eliminated from further consideration. On 
5 average, 11.5% of the sequences in the libraries contained repeat sequences. Of this 
11.5%, 7% contained Alu repeats, 3.3% contained LI repeats and the remaining 1.2% 
were derived from the other types of repetitive sequences which were screened. These 
percentages are consistent with those found in cDNA libraries prepared by other groups. 
For example, the cDNA libraries of Adams et al contained between 0% and 7.4% Alu 
10 repeats depending on the source of the RNA which was used to prepare the cDNA library 
(Adams et al, Nature 377:174, 1996). 

[0167] The sequences of those 5' ESTs remaining after the elimination of 

undesirable sequences were compared with the sequences of known human mRNAs to 
determine the accuracy of the sequencing procedures described above. 

15 

EXAMPLE 19 

Measurement of Sequencing Accuracy by Comparison to Known Sequences 

[0168] To further determine the accuracy of the sequencing procedure 

described above, the sequences of 5' ESTs derived from known sequences were identified 

20 and compared to the known sequences. First, a FASTA analysis with overhangs shorter 
than 5 bp on both ends was conducted on the 5' ESTs to identify those matching an entry 
in the public human mRNA database. The 6655 5' ESTs which matched a known human 
mRNA were then realigned with their cognate mRNA and dynamic programming was 
used to include substitutions, insertions, and deletions in the list of "errors" which would 

25 be recognized. Errors occurring in the last 10 bases of the 5' EST sequences were ignored 

to avoid the inclusion of spurious cloning sites in the analysis of sequencing accuracy. 

[0169] This analysis revealed that the sequences incorporated in the 

NETGENE™ database had an accuracy of more than 99.5%. 
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[0170] To determine the efficiency with which the above selection 

procedures select cDNAs which include the 5' ends of their corresponding mRNAs, the 
following analysis was performed. 

5 EXAMPLE 20 

Determination of Efficiency of 5 r EST Selection 

[0171] To determine the efficiency at which the above selection 

procedures isolated 5' ESTs which included sequences close to the 5' end of the mRNAs 
from which they were derived, the sequences of the ends of the 5' ESTs which were 
10 derived from the elongation factor 1 subunit a and ferritin heavy chain genes were 
compared to the known cDNA sequences for these genes. Since the transcription start 
sites for the elongation factor 1 subunit a and ferritin heavy chain are well characterized, 
they may be used to determine the percentage of 5' ESTs derived from these genes which 
included the authentic transcription start sites. 

15 [0172] For both genes, more than 95% of the cDNAs included sequences 

close to or upstream of the 5' end of the corresponding mRNAs. 

[0173] To extend the analysis of the reliability of the procedures for 

isolating 5' ESTs from ESTs in the NETGENE™ database, a similar analysis was 
conducted using a database composed of human mRNA sequences extracted from 

20 GenBank database release 97 for comparison. For those 5' ESTs derived from mRNAs 
included in the GeneBank database, more than 85% had their 5' ends close to the 5 r ends 
of the known sequence. As some of the mRNA sequences available in the GenBank 
database are deduced from genomic sequences, a 5' end matching with these sequences 
will be counted as an internal match. Thus, the method used here underestimates the yield 

25 of ESTs including the authentic 5' ends of their corresponding mRNAs. 

[0174] The EST libraries made above included multiple 5' ESTs derived 

from the same mRNA. The sequences of such 5' ESTs were compared to one another and 
the longest 5' ESTs for each mRNA were identified. Overlapping cDNAs were assembled 
into continuous sequences (contigs). The resulting continuous sequences were then 
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compared to public databases to gauge their similarity to known sequences, as described in 
Example 21 below. 

EXAMPLE 21 

5 Clustering of the 5' ESTs and Calculation of Novelty Indices for cDNA Libraries 

[0175] For each sequenced EST library, the sequences were clustered by 

the 5' end. Each sequence in the library was compared to the others with BLASTN2 (direct 
strand, parameters S=107). ESTs with High Scoring Segment Pairs (HSPs) at least 25 bp 
long, having 95% identical bases and beginning closer than 10 bp from each EST 5' end 
10 were grouped. The longest sequence found in the cluster was used as representative of the 
cluster. A global clustering between libraries was then performed leading to the definition 
of super-contigs. 

[0176] To assess the yield of new sequences within the EST libraries, a 

novelty rate (NR) was defined as: NR= 1 00 X (Number of new unique sequences found in 
15 the library/Total number of sequences from the library). Typically, novelty rating range 

between 10% and 41% depending on the tissue from which the EST library was obtained. 
For most of the libraries, the random sequencing of 5 r EST libraries was pursued until the 
novelty rate reached 20%. 

[0177] Following characterization as described above, the collection of 5 r 

20 ESTs in NETGENE™ was screened to identify those 5 r ESTs bearing potential signal 
sequences as described in Example 22 below. 

EXAMPLE 22 

Identification of Potential Signal Sequences in 5 r ESTs 

25 [0178] The 5' ESTs in the NETGENE™ database were screened to 

identify those having an uninterrupted open reading frame (ORF) longer than 45 
nucleotides beginning with an ATG codon and extending to the end of the EST. 
Approximately half of the cDNA sequences in NETGENE™ contained such an ORF. The 
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ORFs of these 5' ESTs were searched to identify potential signal motifs using slight 
modifications of the procedures disclosed in Von Heijne, G. A New Method for Predicting 
Signal Sequence Cleavage Sites. Nucleic Acids Res. 14:4683-4690 (1986), the disclosure 
of which is incorporated herein by reference. Those 5' EST sequences encoding a 15 
5 amino acid long stretch with a score of at least 3.5 in the Von Heijne signal peptide 
identification matrix were considered to possess a signal sequence. Those 5' ESTs which 
matched a known human mRNA or EST sequence and had a 5' end more than 20 
nucleotides downstream of the known 5' end were excluded from further analysis. The 
remaining cDNAs having signal sequences therein were included in a database called 
10 SIGNALTAG™. 

[0179] To confiim the accuracy of the above method for identifying signal 

sequences, the analysis of Example 23 was performed. 

EXAMPLE 23 

15 Confirmation of Accuracy of Identification of Potential Signal Sequences in 5' ESTs 

[0180] The accuracy of the above procedure for identifying signal 

sequences encoding signal peptides was evaluated by applying the method to the 43 amino 
terminal amino acids of all human SwissProt proteins. The computed Von Heijne score 
for each protein was compared with the known characterization of the protein as being a 
20 secreted protein or a non-secreted protein. In this manner, the number of non-secreted 

proteins having a score higher than 3.5 (false positives) and the number of secreted 
proteins having a score lower than 3.5 (false negatives) could be calculated. 

[0181] Using the results of the above analysis, the probability that a 

peptide encoded by the 5' region of the mRNA is in fact a genuine signal peptide based on 
25 its Von Heijne f s score was calculated based on either the assumption that 10% of human 

proteins are secreted or the assumption that 20% of human proteins are secreted. The 
results of this analysis are shown in Figures 2 and 3. 

[0182] Using the above method of identifying secretory proteins, 5' ESTs 

for human glucagon, gamma interferon induced monokine precursor, secreted cyclophilin- 
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like protein, human pleiotropin, and human biotinidase precursor all of which are 
polypeptides which are known to be secreted, were obtained. Thus, the above method 
successfully identified those 5 r ESTs which encode a signal peptide. 

[0183] To confirm that the signal peptide encoded by the 5' ESTs actually 

5 functions as a signal peptide, the signal sequences from the 5' ESTs may be cloned into a 
vector designed for the identification of signal peptides. Some signal peptide identification 
vectors are designed to confer the ability to grow in selective medium on host cells which 
have a signal sequence operably inserted into the vector. For example, to confirm that a 5' 
EST encodes a genuine signal peptide, the signal sequence of the 5' EST may be inserted 
10 upstream and in frame with a non-secreted form of the yeast invertase gene in signal 
peptide selection vectors such as those described in U.S. Patent No. 5,536,637, the 
disclosure of which is incorporated herein by reference. Growth of host cells containing 
signal sequence selection vectors having the signal sequence from the 5' EST inserted 
therein confirms that the 5' EST encodes a genuine signal peptide. 

15 [0184] Alternatively, the presence of a signal peptide may be confirmed by 

cloning the extended cDNAs obtained using the ESTs into expression vectors such as 
pXTl (as described below), or by constructing promoter-signal sequence-reporter gene 
vectors which encode fusion proteins between the signal peptide and an assayable reporter 
protein. After introduction of these vectors into a suitable host cell, such as COS cells or 

20 NIH 3T3 cells, the growth medium may be harvested and analyzed for the presence of the 
secreted protein. The medium from these cells is compared to the medium from cells 
containing vectors lacking the signal sequence or extended cDNA insert to identify vectors 
which encode a functional signal peptide or an authentic secreted protein. 

[0185] Those 5' ESTs which encoded a signal peptide, as determined by 

25 the method of Example 22 above, were further grouped into four categories based on their 

homology to known sequences. The categorization of the 5' ESTs is described in Example 
24 below. 
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EXAMPLE 24 

Categorization of 5' ESTs Encoding a Signal Peptide 

[0186] Those 5' ESTs having a sequence not matching any known 

vertebrate sequence nor any publicly available EST sequence were designated "new." Of 
5 the sequences in the SIGNALTAG™ database, 947 of the 5' ESTs having a Von Heijne's 
score of at least 3.5 fell into this category. 

[0187] Those 5' ESTs having a sequence not matching any vertebrate 

sequence but matching a publicly known EST were designated "EST-ext", provided that 
the known EST sequence was extended by at least 40 nucleotides in the 5 r direction. Of 
10 the sequences in the SIGNALTAG™ database, 150 of the 5' ESTs having a Von Heijne's 
score of at least 3.5 fell into this category. 

[0188] Those ESTs not matching any vertebrate sequence but matching a 

publicly known EST without extending the known EST by at least 40 nucleotides in the 5' 
direction were designated "EST." Of the sequences in the SIGNALTAG™ database, 599 
15 of the 5' ESTs having a Von Heijne's score of at least 3.5 fell into this category. 

[0189] Those 5' ESTs matching a human mRNA sequence but extending 

the known sequence by at least 40 nucleotides in the 5 r direction were designated "VERT- 
ext" Of the sequences in the SIGNALTAG™ database, 23 of the 5' ESTs having a Von 
Heijne's score of at least 3.5 fell into this category. Included in this category was a 5' EST 
20 which extended the known sequence of the human translocase mRNA by more than 200 
bases in the 5' direction. A 5' EST which extended the sequence of a human tumor 
suppressor gene in the 5' direction was also identified. 

[0190] Figure 4 shows the distribution of 5' ESTs in each category and the 

number of 5 f ESTs in each category having a given minimum von Heijne's score. 

25 [0191] Each of the 5 f ESTs was categorized based on the tissue from 

which its corresponding mRNA was obtained, as described below in Example 25. 
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EXAMPLE 25 

Categorization of Expression Patterns 

[0192] Figure 5 shows the tissues from which the mRNAs corresponding 

to the 5' ESTs in each of the above described categories were obtained. 

5 [0193] In addition to categorizing the 5' ESTs by the tissue from which the 

cDNA library in which they were first identified was obtained, the spatial and temporal 
expression patterns of the mRNAs corresponding to the 5' ESTs, as well as their 
expression levels, may be determined as described in Example 26 below. Characterization 
of the spatial and temporal expression patterns and expression levels of these mRNAs is 
10 useful for constructing expression vectors capable of producing a desired level of gene 
product in a desired spatial or temporal manner, as will be discussed in more detail below. 

[0194] In addition, 5' ESTs whose corresponding mRNAs are associated 

with disease states may also be identified. For example, a particular disease may result 
from lack of expression, over expression, or under expression of an mRNA corresponding 
15 to a 5 r EST. By comparing mRNA expression patterns and quantities in samples taken 
from healthy individuals with those from individuals suffering from a particular disease, 5' 
ESTs responsible for the disease may be identified. 

[0195] It will be appreciated that the results of the above characterization 

procedures for 5' ESTs also apply to extended cDNAs (obtainable as described below) 
20 which contain sequences adjacent to the 5' ESTs. It will also be appreciated that if it is 
desired to defer characterization until extended cDNAs have been obtained rather than 
characterizing the ESTs themselves, the above characterization procedures can be applied 
to characterize the extended cDNAs after their isolation. 
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EXAMPLE 26 

Evaluation of Expression Levels and Patterns of mRNAs 

Corresponding to 5' ESTs or Extended cDNAs 

[0196] Expression levels and patterns of mRNAs corresponding to 5' ESTs 

5 or extended cDNAs (obtainable as described below) may be analyzed by solution 
hybridization with long probes as described in International Patent Application No. WO 
97/05277, the entire contents of which are hereby incorporated by reference. Briefly, a 5' 
EST, extended cDNA, or fragment thereof corresponding to the gene encoding the mRNA 
to be characterized is inserted at a cloning site immediately downstream of a bacteriophage 

10 (T3, T7 or SP6) RNA polymerase promoter to produce antisense RNA. Preferably, the 5' 
EST or extended cDNA has 100 or more nucleotides. The plasmid is linearized and 
transcribed in the presence of ribonucleotides comprising modified ribonucleotides (i.e. 
biotin-UTP and DIG-UTP). An excess of this doubly labeled RNA is hybridized in 
solution with mRNA isolated from cells or tissues of interest. The hybridizations are 

15 performed under standard stringent conditions (40-50°C for 16 hours in an 80% 
formamide, 0.4 M NaCl buffer, pH 7-8). The unhybridized probe is removed by digestion 
with ribonucleases specific for single-stranded RNA (i.e. RNases CL3, Tl, Phy M, U2 or 
A). The presence of the biotin-UTP modification enables capture of the hybrid on a 
microtitration plate coated with streptavidin. The presence of the DIG modification 

20 enables the hybrid to be detected and quantified by ELISA using an anti-DIG antibody 
coupled to alkaline phosphatase. 

[0197] The 5' ESTs, extended cDNAs, or fragments thereof may also be 

tagged with nucleotide sequences for the serial analysis of gene expression (SAGE) as 
disclosed in UK Patent Application No. 2 305 241 A, the entire contents of which are 

25 incorporated by reference. In this method, cDNAs are prepared from a cell, tissue, 

organism or other source of nucleic acid for which it is desired to determine gene 
expression patterns. The resulting cDNAs are separated into two pools. The cDNAs in 
each pool are cleaved with a first restriction endonuclease, called an "anchoring enzyme/' 
having a recognition site which is likely to be present at least once in most cDNAs. The 

30 fragments which contain the 5' or 3' most region of the cleaved cDNA are isolated by 
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binding to a capture medium such as streptavidin coated beads. A first oligonucleotide 
linker having a first sequence for hybridization of an amplification primer and an internal 
restriction site for a "tagging endonuclease" is ligated to the digested cDNAs in the first 
pool. Digestion with the second endonuclease produces short "tag" fragments from the 
5 cDNAs. 

[0198] A second oligonucleotide having a second sequence for 

hybridization of an amplification primer and an internal restriction site is ligated to the 
digested cDNAs in the second pool. The cDNA fragments in the second pool are also 
digested with the "tagging endonuclease" to generate short "tag" fragments derived from 

10 the cDNAs in the second pool. The "tags" resulting from digestion of the first and second 
pools with the anchoring enzyme and the tagging endonuclease are ligated to one another 
to produce "ditags." In some embodiments, the ditags are concatamerized to produce 
ligation products containing from 2 to 200 ditags. The tag sequences are then determined 
and compared to the sequences of the 5' ESTs or extended cDNAs to determine which 5' 

15 ESTs or extended cDNAs are expressed in the cell, tissue, organism, or other source of 
nucleic acids from which the tags were derived. In this way, the expression pattern of the 
5' ESTs or extended cDNAs in the cell, tissue, organism, or other source of nucleic acids is 
obtained. 

[0199] Quantitative analysis of gene expression may also be performed 

20 using arrays. As used herein, the term array means a one dimensional, two dimensional, or 
multidimensional arrangement of full length cDNAs (i.e. extended cDNAs which include 
the coding sequence for the signal peptide, the coding sequence for the mature protein, and 
a stop codon), extended cDNAs, 5' ESTs or fragments of the full length cDNAs, extended 
cDNAs, or 5' ESTs of sufficient length to permit specific detection of gene expression. 
25 Preferably, the fragments are at least 15 nucleotides in length. More preferably, the 
fragments are at least 100 nucleotides in length. More preferably, the fragments are more 
than 100 nucleotides in length. In some embodiments the fragments may be more than 
500 nucleotides in length. 

[0200] For example, quantitative analysis of gene expression may be 

30 performed with full length cDNAs, extended cDNAs, 5' ESTs, or fragments thereof in a 
complementary DNA microarray as described by Schena et aL {Science 270:467-470, 
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1995; Proa Natl Acad. Set U.S.A. 93:10614-10619, 1996). Full length cDNAs, extended 
cDNAs, 5' ESTs or fragments thereof are amplified by PCR and arrayed from 96-well 
microtiter plates onto silylated microscope slides using high-speed robotics. Printed arrays 
are incubated in a humid chamber to allow rehydration of the array elements and rinsed, 
5 once in 0.2% SDS for 1 min, twice in water for 1 min and once for 5 min in sodium 
borohydride solution. The arrays are submerged in water for 2 min at 95°C f transferred 
into 0.2% SDS for 1 min, rinsed twice with water, air dried and stored in the dark at 25°C. 

[0201] Cell or tissue mRNA is isolated or commercially obtained and 

probes are prepared by a single round of reverse transcription. Probes are hybridized to 1 

10 cm 2 microarrays under a 14 x 14 mm glass coverslip for 6-12 hours at 60°C. Arrays are 
washed for 5 min at 25°C in low stringency wash buffer (1 x SSC/0.2% SDS), then for 10 
min at room temperature in high stringency wash buffer (0.1 x SSC/0.2% SDS). Arrays 
are scanned in 0.1 x SSC using a fluorescence laser scanning device fitted with a custom 
filter set. Accurate differential expression measurements are obtained by taking the 

1 5 average of the ratios of two independent hybridizations. 

[0202] Quantitative analysis of the expression of genes may also be 

performed with full length cDNAs, extended cDNAs, 5' ESTs, or fragments thereof in 
complementary DNA arrays as described by Pietu et al. (Genome Research 6:492-503, 
1996). The full length cDNAs, extended cDNAs, 5' ESTs or fragments thereof are PCR 
20 amplified and spotted on membranes. Then, mRNAs originating from various tissues or 
cells are labeled with radioactive nucleotides. After hybridization and washing in 
controlled conditions, the hybridized mRNAs are detected by phospho-imaging or 
autoradiography. Duplicate experiments are performed and a quantitative analysis of 
differentially expressed mRNAs is then performed. 

25 [0203] Alternatively, expression analysis of the 5 r ESTs or extended 

cDNAs can be done through high density nucleotide arrays as described by Lockhart et al 
(Nature Biotechnology 14: 1675-1680, 1996) and Sosnowsky et al. (Proc. Natl. Acad. Sci. 
94:1119-1123, 1997). Oligonucleotides of 15-50 nucleotides corresponding to sequences 
of the 5' ESTs or extended cDNAs are synthesized directly on the chip (Lockhart et al., 
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supra) or synthesized and then addressed to the chip (Sosnowski et al, supra). Preferably, 
the oligonucleotides are about 20 nucleotides in length. 

[0204] cDNA probes labeled with an appropriate compound, such as 

biotin, digoxigenin or fluorescent dye, are synthesized from the appropriate mRNA 

5 population and then randomly fragmented to an average size of 50 to 100 nucleotides. The 
said probes are then hybridized to the chip. After washing as described in Lockhart et al., 
supra and application of different electric fields (Sosnowsky et al., Proc. Natl. Acad. Sci. 
94:1119-1123)., the dyes or labeling compounds are detected and quantified. Duplicate 
hybridizations are performed. Comparative analysis of the intensity of the signal 

10 originating from cDNA probes on the same target oligonucleotide in different cDNA 

samples indicates a differential expression of the mRNA corresponding to the 5' EST or 
extended cDNA from which the oligonucleotide sequence has been designed. 

III. Use of 5' ESTs to Clone Extended cDNAs and to Clone the Corresponding 
15 Genomic DNAs 

[0205] Once 5' ESTs which include the 5' end of the corresponding 

mRNAs have been selected using the procedures described above, they can be utilized to 
isolate extended cDNAs which contain sequences adjacent to the 5' ESTs. The extended 
cDNAs may include the entire coding sequence of the protein encoded by the 

20 corresponding mRNA, including the authentic translation start site, the signal sequence, 
and the sequence encoding the mature protein remaining after cleavage of the signal 
peptide. Such extended cDNAs are referred to herein as "full length cDNAs." 
Alternatively, the extended cDNAs may include only the sequence encoding the mature 
protein remaining after cleavage of the signal peptide, or only the sequence encoding the 

25 signal peptide. 

[0206] Example 27 below describes a general method for obtaining 

extended cDNAs. Example 28 below describes the cloning and sequencing of several 
extended cDNAs, including extended cDNAs which include the entire coding sequence 
and authentic 5' end of the corresponding mRNA for several secreted proteins. 
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[0207] The methods of Examples 27, 28, and 29 can also be used to obtain 

extended cDNAs which encode less than the entire coding sequence of the secreted 
proteins encoded by the genes corresponding to the 5' ESTs. In some embodiments, the 
extended cDNAs isolated using these methods encode at least 10 amino acids of one of the 
5 proteins encoded by the sequences of SEQ ID NOs: 40-84 and 130-154. In further 
embodiments, the extended cDNAs encode at least 20 amino acids of the proteins encoded 
by the sequences of SEQ ID NOs: 40-84 and 130-154. hi further embodiments, the 
extended cDNAs encode at least 30 amino amino acids of the sequences of SEQ ID NOs: 
40-84 and 130-154. In a preferred embodiment, the extended cDNAs encode a full length 
10 protein sequence, which includes the protein coding sequences of SEQ ID NOs: 40-84 and 
130-154. 

EXAMPLE 27 

General Method for Using 5' ESTs to Clone and Sequence Extended cDNAs 

15 [0208] The following general method has been used to quickly and 

efficiently isolate extended cDNAs including sequence adjacent to the sequences of the 5' 
ESTs used to obtain them. This method may be applied to obtain extended cDNAs for any 
5' EST in the NETGENE™ database, including those 5' ESTs encoding secreted proteins. 
The method is summarized in Figure 6. 

20 1. Obtaining Extended cDNAs 

a) First strand synthesis 

[0209] The method takes advantage of the known 5 f sequence of the 

mRNA. A reverse transcription reaction is conducted on purified mRNA with a poly 14dT 
primer containing a 49 nucleotide sequence at its 5' end allowing the addition of a known 
25 sequence at the end of the cDNA which corresponds to the 3 1 end of the mRNA. For 
example, the primer may have the following sequence: 5-ATC GTT GAG ACT CGT 
ACC AGC AGA GTC ACG AGA GAG ACT ACA CGG TAC TGG TTT TTT TTT TTT 
TTVN -y (SEQ ID NO:14). Those skilled in the art will appreciate that other sequences 
may also be added to the poly dT sequence and used to prime the first strand synthesis. 
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Using this primer and a reverse transcriptase such as the Superscript II (Gibco BRL) or 
Rnase H Minus M-MLV (Promega) enzyme, a reverse transcript anchored at the 3' polyA 
site of the RNAs is generated. 

[0210] After removal of the mRNA hybridized to the first cDNA strand by 

5 alkaline hydrolysis, the products of the alkaline hydrolysis and the residual poly dT primer 
are eliminated with an exclusion column such as an AcA34 (Biosepra) matrix as explained 
in Example 11. 

b) Second strand synthesis 

[0211] A pair of nested primers on each end is designed based on the 

10 known 5 r sequence from the 5 f EST and the known 3 f end added by the poly dT primer 
used in the first strand synthesis. Software used to design primers are either based on GC 
content and melting temperatures of oligonucleotides, such as OSP (Illier and Green, PCR 
Metk AppL 1:124-128, 1991), or based on the octamer frequency disparity method 
(Griffais et al., Nucleic Acids Res, 19: 3887-3891, 1991 such as PC-Rare 
15 (http://bioinformatics.weizma^ 

[0212] Preferably, the nested primers at the 5 r end are separated from one 

another by four to nine bases. The 5* primer sequences may be selected to have melting 
temperatures and specificities suitable for use in PCR. 

[0213] Preferably, the nested primers at the 3' end are separated from one 

20 another by four to nine bases. For example, the nested 3 r primers may have the following 
sequences: (5'- CCA GCA GAG TCA CGA GAG AGA CTA CAC GG -3 ? (SEQ ID 
NO: 15), and 5'- CAC GAG AGA GAC TAC ACG GTA CTG G -3' (SEQ ED NO: 16). 
These primers were selected because they have melting temperatures and specificities 
compatible with their use in PCR. However, those skilled in the art will appreciate that 
25 other sequences may also be used as primers. 

[0214] The first PCR run of 25 cycles is performed using the Advantage 

Tth Polymerase Mix (Clontech) and the outer primer from each of the nested pairs. A 
second 20 cycle PCR using the same enzyme and the inner primer from each of the nested 
pairs is then performed on 1/2500 of the first PCR product. Thereafter, the primers and 
30 nucleotides are removed. 
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2. Sequencing of Full Length Extended cDNAs or Fragments Thereof 

[0215] Due to the lack of position constraints on the design of 5' nested 

primers compatible for PCR use using the OSP software, amplicons of two types are 
5 obtained. Preferably, the second 5' primer is located upstream of the translation initiation 
codon thus yielding a nested PCR product containing the whole coding sequence. Such a 
full length extended cDNA undergoes a direct cloning procedure as described in section a 
below. However, in some cases, the second 5 ? primer is located downstream of the 
translation initiation codon, thereby yielding a PCR product containing only part of the 
10 ORF. Such incomplete PCR products are submitted to a modified procedure described in 
section b below. 

a) Nested PCR products containing complete ORFs 

[0216] When the resulting nested PCR product contains the complete 

coding sequence, as predicted from the 5 ? EST sequence, it is cloned in an appropriate 
15 vector such as pED6dpc2, as described in section 3. 

b) Nested PCR products containing incomplete ORFs 

[0217] When the amplicon does not contain the complete coding 

sequence, intermediate steps are necessary to obtain both the complete coding sequence 
and a PCR product containing the full coding sequence. The complete coding sequence 
20 can be assembled from several partial sequences determined directly from different 

PCR products as described in the following section. 

[0218] Once the full coding sequence has been completely determined, 

new primers compatible for PCR use are designed to obtain amplicons containing the 
whole coding region. However, in such cases, 3 5 primers compatible for PCR use are 
25 located inside the 3 ? UTR of the corresponding mRNA, thus yielding amplicons which 

lack part of this region, i.e. the polyA tract and sometimes the polyadenylation signal, as 
illustrated in figure 6. Such full length extended cDNAs are then cloned into an 
appropriate vector as described in section 3. 
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c) Sequencing extended cDNAs 

[0219] Sequencing of extended cDNAs can be performed using a Die 

Terminator approach with the AmpliTaq DNA polymerase FS kit available from Perkin 
Elmer. 

5 [0220] In order to sequence PCR fragments, primer walking is performed 

using software such as OSP to choose primers and automated computer software such as 
ASMG (Sutton et al. ? Genome Science Technol 1: 9-19, 1995) to construct contigs of 
walking sequences including the initial 5' tag using minimum overlaps of 32 nucleotides. 
Preferably, primer walking is performed until the sequences of full length cDNAs are 
10 obtained. 

[0221] Completion of the sequencing of a given extended cDNA 

fragment is assessed as follows. Since sequences located after a polyA tract are difficult 
to determine precisely in the case of uncloned products, sequencing and primer walking 
processes for PCR products are interrupted when a polyA tract is identified in extended 

15 cDNAs obtained as described in case b. The sequence length is compared to the size of 

the nested PCR product obtained as described above. Due to the limited accuracy of the 
determination of the PCR product size by gel electrophoresis, a sequence is considered 
complete if the size of the obtained sequence is at least 70 % the size of the first nested 
PCR product. If the length of the sequence determined from the computer analysis is not 

20 at least 70% of the length of the nested PCR product, these PCR products are cloned and 
the sequence of the insertion is determined. When Northern blot data are available, the 
size of the mRNA detected for a given PCR product is used to finally assess that the 
sequence is complete. Sequences which do not fulfill the above criteria are discarded and 
will undergo a new isolation procedure. 

25 [0222] Sequence data of all extended cDNAs are then transferred to a 

proprietary database, where quality controls and validation steps are carried out as 
described in example 15. 

3. Cloning of Full Length Extended cDNAs 

[0223] The PCR product containing the full coding sequence is then 

30 cloned in an appropriate vector. For example, the extended cDNAs can be cloned into the 
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expression vector pED6dpc2 (DiscoverEase, Genetics Institute, Cambridge, MA) as 
follows. The structure of pED6dpc2 is shown in Figure 7. pED6dpc2 vector DNA is 
prepared with blunt ends by performing an EcoRI digestion followed by a fill in reaction. 
The blunt ended vector is dephosphorylated. After removal of PCR primers and ethanol 
5 precipitation, the PCR product containing the full coding sequence or the extended cDNA 
obtained as described above is phosphorylated with a kinase subsequently removed by 
phenol-Sevag extraction and precipitation. The double stranded extended cDNA is then 
ligated to the vector and the resulting expression plasmid introduced into appropriate host 
cells. 

10 [0224] Since the PCR products obtained as described above are blunt 

ended molecules that can be cloned in either direction, the orientation of several clones for 
each PCR product is determined. Then, 4 to 10 clones are ordered in microtiter plates and 
subjected to a PCR reaction using a first primer located in the vector close to the cloning 
site and a second primer located in the portion of the extended cDNA corresponding to the 

15 3 r end of the mRNA. This second primer may be the antisense primer used in anchored 

PCR in the case of direct cloning (case a) or the antisense primer located inside the 3TJTR 
in the case of indirect cloning (case b). Clones in which the start codon of the extended 
cDNA is operably linked to the promoter in the vector so as to permit expression of the 
protein encoded by the extended cDNA are conserved and sequenced. In addition to the 

20 ends of cDNA inserts, approximately 50 bp of vector DNA on each side of the cDNA 
insert are also sequenced. 

[0225] The cloned PCR products are then entirely sequenced according to 

the aforementioned procedure. In this case, contig assembly of long fragments is then 
performed on walking sequences that have already contigated for uncloned PCR products 
25 during primer walking. Sequencing of cloned amplicons is complete when the resulting 

contigs include the whole coding region as well as overlapping sequences with vector 
DNA on both ends. 

4. Computer Analysis of Full Length Extended cDNA 

[0226] Sequences of all full length extended cDNAs may then be 

30 subjected to further analysis as described below and using the parameters found in 

Table II with the following modifications. For screening of miscellaneous subdivisions 
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of Genbank, FAST A was used instead of BLASTN and 15 nucleotide of homology was 
the limit instead of 17. For Alu detection, BLASTN was used with the following 
parameters: S=72; identity=70%; and length = 40 nucleotides. Polyadenylation signal 
and polyA tail which were not search for the 5' ESTs were searched. For 
5 polyadenylation signal detection the signal (AATAAA) was searched with one 
permissible mismatch in the last fifty nucleotides preceding the 5 r end of the polyA. For 
the polyA, a stretch of 8 amino acids in the last 20 nucleotides of the sequence was 
searched with BLAST2N in the sense strand with the following parameters (W=6, 
S=40, E=1000, and identity=90%). Finally, patented sequences and ORF homologies 
10 were searched using, respectively, BLASTN and BLASTP on GenSEQ (Derwent's 
database of patented nucleotide sequences) and SWISSPROT for ORFs with the 
following parameters (W=8 and B=10). Before examining the extended full length 
cDNAs for sequences of interest, extended cDNAs which are not of interest are 
searched as follows. 

15 a) Elimination of undesired sequences 

[0227] Although 5 'ESTs were checked to remove contaminants 

sequences as described in Example 18, a last verification was carried out to identify 
extended cDNAs sequences derived from undesired sequences such as vector RNAs, 
transfer RNAs, ribosomal rRNAs, mitochondrial RNAs, prokaryotic RNAs and fungal 
20 RNAs using the FASTA and BLASTN programs on both strands of extended cDNAs as 
described below. 

[0228] To identify the extended cDNAs encoding vector RNAs, 

extended cDNAs are compared to the known sequences of vector RNA using the 
FASTA program. Sequences of extended cDNAs with more than 90% homology over 
25 stretches of 15 nucleotides are identified as vector RNA. 

[0229] To identify the extended cDNAs encoding tRNAs, extended cDNA 

sequences were compared to the sequences of 1 190 known tRNAs obtained from EMBL 
release 38, of which 100 were human. Sequences of extended cDNAs having more than 
80% homology over 60 nucleotides using FASTA were identified as tRNA. 
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[0230] To identify the extended cDNAs encoding rRNAs, extended cDNA 

sequences were compared to the sequences of 2497 known rRNAs obtained from EMBL 
release 38, of which 73 were human. Sequences of extended cDNAs having more than 
80% homology over stretches longer than 40 nucleotides using BLASTN were identified 
as rRNAs. 

[0231] To identify the extended cDNAs encoding mtRNAs, extended 

cDNA sequences were compared to the sequences of the two known mitochondrial 
genomes for which the entire genomic sequences are available and all sequences 
transcribed from these mitochondrial genomes including tRNAs, rRNAs, and mRNAs for 
a total of 38 sequences. Sequences of extended cDNAs having more than 80% homology 
over stretches longer than 40 nucleotides using BLASTN were identified as mtRNAs. 

[0232] Sequences which might have resulted from other exogenous 

contaminants were identified by comparing extended cDNA sequences to release 105 of 
Genbank bacterial and fungal divisions. Sequences of extended cDNAs having more than 
90% homology over 40 nucleotides using BLASTN were identified as exogenous 
prokaryotic or fungal contaminants. 

[0233] In addition, extended cDNAs were searched for different repeat 

sequences, including Alu sequences, LI sequences, THE and MER repeats, SSTR 
sequences or satellite, micro-satellite, or telomeric repeats. Sequences of extended 
cDNAs with more than 70% homology over 40 nucleotide stretches using BLASTN 
were identified as repeat sequences and masked in further identification procedures. In 
addition, clones showing extensive homology to repeats , i.e., matches of either more 
than 50 nucleotides if the homology was at least 75% or more than 40 nucleotides if the 
homology was at least 85% or more than 30 nucleotides if the homology was at least 
90%, were flagged. 

b) Identification of structural features 

[0234] Structural features, e.g. polyA tail and polyadenylation signal, of 

the sequences of full length extended cDNAs are subsequently determined as follows. 

[0235] A polyA tail is defined as a homopolymeric stretch of at least 1 1 

A with at most one alternative base within it. The polyA tail search is restricted to the 
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last 20 nt of the sequence and limited to stretches of 11 consecutive A's because 
sequencing reactions are often not readable after such a polyA stretch. Stretches with 
100% homology over 6 nucleotides are identified as polyA tails. 

[0236] To search for a polyadenylation signal, the polyA tail is clipped 

5 from the full-length sequence. The 50 bp preceding the polyA tail are first searched for the 
canonic polyadenylation AAUAAA signal and, if the canonic signal is not detected, for the 
alternative AUUAAA signal (Sheets et al., Nuc. Acids Res. 18: 5799-5805, 1990). If 
neither of these consensus polyadenylation signals is found, the canonic motif is searched 
again allowing one mismatch to account for possible sequencing errors. More than 85 % 

10 of identified polyadenylation signals of either type actually ends 10 to 30 bp from the 
polyA tail. Alternative AUUAAA signals represents approximately 15 % of the total 
number of identified polyadenylation signals. 

[0237] To search for a polyadenylation signal, the polyA tail is clipped 

from the full-length sequence. The 50 bp preceding the polyA tail are searched for the 

15 canonic polyadenylation AAUAAA signal allowing one mismatch to account for possible 
sequencing errors and known variation in the canonical sequence of the polyadenylation 
signal. 

c) Identification of functional features 

[0238] Functional features, e.g. ORFs and signal sequences, of the 

20 sequences of full length extended cDNAs were subsequently determined as follows. 

[0239] The 3 upper strand frames of extended cDNAs are searched for 

ORFs defined as the maximum length fragments beginning with a translation initiation 
codon and ending with a stop codon. ORFs encoding at least 20 amino acids are preferred. 
[0240] Each found ORF is then scanned for the presence of a signal 

25 peptide in the first 50 amino-acids or, where appropriate, within shorter regions down to 20 
amino acids or less in the ORF, using the matrix method of von Heijne (Nuc. Acids Res. 
14: 4683-4690 (1986)), the disclosure of which is incorporated herein by reference and the 
modification described in Example 22. 
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d) Homology to either nucleotidic or proteic sequences 

[0241] Sequences of foil length extended cDNAs are then compared to 

known sequences on a nucleotidic or proteic basis. 

[0242] Sequences of full length extended cDNAs are compared to the 

5 following known nucleic acid sequences: vertebrate sequences, EST sequences ? patented 
sequences and recently identified sequences available at the time of filing the priority 
documents. Full length cDNA sequences are also compared to the sequences of a private 
database (Genset internal sequences) in order to find sequences that have already been 
identified by applicants. Sequences of full length extended cDNAs with more than 90% 
10 homology over 30 nucleotides using either BLASTN or BLAST2N as indicated in Table 
III are identified as sequences that have already been described. Matching vertebrate 
sequences are subsequently examined using FASTA; foil length extended cDNAs with 
more than 70% homology over 30 nucleotides are identified as sequences that have 
already been described. 

15 [0243] ORFs encoded by foil length extended cDNAs as defined in 

section c) are subsequently compared to known amino acid sequences found in public 
databases using Swissprot, PIR and Genptept releases available at the time of filing the 
priority documents for the present application. These analyses were performed using 
BLASTP with the parameter W=8 and allowing a maximum of 10 matches. Sequences 

20 of full length extended cDNAs showing extensive homology to known protein 

sequences are recognized as already identified proteins. 

[0244] In addition, the three-frame conceptual translation products of the 

top strand of foil length extended cDNAs are compared to publicly known amino acid 
sequences of Swissprot using BLASTX with the parameter E=0.0OL Sequences of full 
25 length extended cDNAs with more than 70% homology over 30 amino acid stretches 

are detected as already identified proteins. 

[0245] As used herein the term "cDNA codes of SEQ ID NOs. 40-84 

and 130-154" encompasses the nucleotide sequences of SEQ ID NOs. 40-84 and 130- 
154, fragments of SEQ ID NOs. 40-84 and 130-154, nucleotide sequences homologous to 
30 SEQ ID NOs. 40-84 and 130-154 or homologous to fragments of SEQ ID NOs. 40-84 
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and 130-154, and sequences complementary to all of the preceding sequences. The 
fragments include portions of SEQ ID NOs. 40-84 and 130-154 comprising at least 10, 
15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive nucleotides of 
SEQ ID NOs. 40-84 and 130-154. Preferably, the fragments are novel fragments. 
5 Homologous sequences and fragments of SEQ ID NOs. 40-84 and 130-154 refer to a 
sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, or 75% homology 
to these sequences. Homology may be determined using any of the computer programs 
and parameters described herein, including BLAST2N with the default parameters or with 
any modified parameters. Homologous sequences also include RNA sequences in which 

10 uridines replace the thymines in the cDNA codes of SEQ ID NOs. 40-84 and 130-154. 
The homologous sequences may be obtained using any of the procedures described herein 
or may result from the correction of a sequencing error as described above. It will be 
appreciated that the cDNA codes of SEQ ID NOs. 40-84 and 130-154 can be represented 
in the traditional single character format (See the inside back cover of Starrier, Lubert. 

15 Biochemistry, 3 rd edition. W. H Freeman & Co., New York.) or in any other format which 

records the identity of the nucleotides in a sequence. 

[0246] As used herein the term "polypeptide codes of SEQ ID NOS. 85- 

129 and 155-179" encompasses the polypeptide sequence of SEQ ID NOs. 85-129 and 
155-179 which are encoded by the extended cDNAs of SEQ ED NOs. 40-84 and 130- 

20 1 54, polypeptide sequences homologous to the polypeptides of SEQ ID NOS. 85-1 29 and 

155-179, or fragments of any of the preceding sequences. Homologous polypeptide 
sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 95%, 
90%, 85%, 80%, 75% homology to one of the polypeptide sequences of SEQ ID NOS. 
85-129 and 155-179. Homology may be determined using any of the computer programs 

25 and parameters described herein, including FASTA with the default parameters or with 

any modified parameters. The homologous sequences may be obtained using any of the 
procedures described herein or may result from the correction of a sequencing error as 
described above. The polypeptide fragments comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 
50, 75, 100, or 150 consecutive amino acids of the polypeptides of SEQ ID NOS. 85-129 

30 and 1 55-179. Preferably, the fragments are novel fragments. It will be appreciated that the 
polypeptide codes of the SEQ ID NOS. 85-129 and 155-179 can be represented in the 
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traditional single character format or three letter format (See the inside back cover of 
Starrier, Lubert. Biochemistry, 3 rd edition. W. H Freeman & Co., New York.) or in any 
other format which relates the identity of the polypeptides in a sequence. 

[0247] It will be appreciated by those skilled in the art that the cDNA 

5 codes of SEQ ID NOs. 40-84 and 130-154 and polypeptide codes of SEQ ID NOS. 85- 
129 and 155-179 can be stored, recorded, and manipulated on any medium which can be 
read and accessed by a computer. As used herein, the words "recorded" and "stored" refer 
to a process for storing information on a computer medium. A skilled artisan can readily 
adopt any of the presently known methods for recording information on a computer 

10 readable medium to generate manufactures comprising one or more of the cDNA codes of 
SEQ ID NOs. 40-84 and 130-154, one or more of the polypeptide codes of SEQ ID NOS. 
85-129 and 155-179. Another aspect of the present invention is a computer readable 
medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 cDNA codes of 
SEQ ID NOs. 40-84 and 130-154. Another aspect of the present invention is a computer 

15 readable medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 
polypeptide codes of SEQ ID NOS. 85-129 and 155-179. 

[0248] Computer readable media include magnetically readable media, 

optically readable media, electronically readable media and magnetic/optical media. For 
example, the computer readable media may be a hard disc, a floppy disc, a magnetic tape, 
20 CD-ROM, DVD, RAM, or ROM as well as other types of other media known to those 

skilled in the art. 

[0249] Embodiments of the present invention include systems, particularly 

computer systems which contain the sequence information described herein. As used 
herein, "a computer system" refers to the hardware components, software components, and 
25 data storage components used to analyze the nucleotide sequences of the cDNA codes of 

SEQ ID NOs. 40-84 and 130-154, or the amino acid sequences of the polypeptide codes 
of SEQ ID NOS. 85-129 and 155-179. The computer system preferably includes the 
computer readable media described above, and a processor for accessing and manipulating 
the sequence data. 

30 [0250] Preferably, the computer is a general purpose system that comprises 

a central processing unit (CPU), one or more data storage components for storing data, and 
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one or more data retrieving devices for retrieving the data stored on the data storage 
components. A skilled artisan can readily appreciate that any one of the currently 
available computer systems are suitable. 

[0251] In one particular embodiment, the computer system includes a 

5 processor connected to a bus which is connected to a main memory (preferably 
implemented as RAM) and one or more data storage devices, such as a hard drive and/or 
other computer readable media having data recorded thereon. In some embodiments, the 
computer system further includes one or more data retrieving devices for reading the data 
stored on the data storage components. The data retrieving device may represent, for 

10 example, a floppy disk drive, a compact disk drive, a magnetic tape drive, etc. In some 
embodiments, the data storage component is a removable computer readable medium such 
as a floppy disk, a compact disk, a magnetic tape, etc. containing control logic and/or data 
recorded thereon. The computer system may advantageously include or be programmed 
by appropriate software for reading the control logic and/or the data from the data storage 

15 component once inserted in the data retrieving device. Software for accessing and 

processing the nucleotide sequences of the cDNA codes of SEQ ID NOs. 40-84 and 130- 
154, or the amino acid sequences of the polypeptide codes of SEQ ID NOS. 85-129 and 
155-179 (such as search tools, compare tools, and modeling tools etc.) may reside in main 
memory during execution. 

20 [0252] In some embodiments, the computer system may further comprise a 

sequence comparer for comparing the above-described cDNA codes of SEQ ID NOs. 40- 
84 and 130-154 or polypeptide codes of SEQ ED NOS. 85-129 and 155-179 stored on a 
computer readable medium to reference nucleotide or polypeptide sequences stored on a 
computer readable medium. A "sequence comparer" refers to one or more programs 

25 which are implemented on the computer system to compare a nucleotide or polypeptide 
sequence with other nucleotide or polypeptide sequences and/or compounds including but 
not limited to peptides, peptidomimetics, and chemicals stored within the data storage 
means. For example, the sequence comparer may compare the nucleotide sequences of the 
cDNA codes of SEQ ED NOs. 40-84 and 130-154, or the amino acid sequences of the 

30 polypeptide codes of SEQ ID NOS. 85-129 and 155-179 stored on a computer readable 
medium to reference sequences stored on a computer readable medium to identify 
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homologies, motifs implicated in biological function, or structural motifs. The various 
sequence comparer programs identified elsewhere in this patent specification are 
particularly contemplated for use in this aspect of the invention. 

[0253] Accordingly, one aspect of the present invention is a computer 

5 system comprising a processor, a data storage device having stored thereon a cDNA 
code of SEQ ID NOs. 40-84 and 130-154 or a polypeptide code of SEQ ID NOS. 85- 
129 and 155-179, a data storage device having retrievably stored thereon reference 
nucleotide sequences or polypeptide sequences to be compared to the cDNA code of 
SEQ ID NOs. 40-84 and 130-154 or polypeptide code of SEQ ID NOS. 85-129 and 

10 155-179 and a sequence comparer for conducting the comparison. The sequence 

comparer may indicate a homology level between the sequences compared or identify 
structural motifs in the above described cDNA code of SEQ ID NOs. 40-84 and 130- 
154 and polypeptide codes of SEQ ID NOS. 85-129 and 155-179 or it may identify 
structural motifs in sequences which are compared to these cDNA codes and 

15 polypeptide codes. In some embodiments, the data storage device may have stored 
thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the cDNA codes of 
SEQ ID NOs. 40-84 and 130-154 or polypeptide codes of SEQ ID NOS. 85-129 and 
155-179. 

[0254] Another aspect of the present invention is a method for determining 

20 the level of homology between a cDNA code of SEQ ID NOs. 40-84 and 130-154 and a 
reference nucleotide sequence, comprising the steps of reading the cDNA code and the 
reference nucleotide sequence through the use of a computer program which determines 
homology levels and determining homology between the cDNA code and the reference 
nucleotide sequence with the computer program. The computer program may be any of a 
25 number of computer programs for determining homology levels, including those 

specifically enumerated below, including BLAST2N with the default parameters or with 
any modified parameters. The method may be implemented using the computer systems 
described above. The method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 
50 of the above described cDNA codes of SEQ ID NOs. 40-84 and 130-154 through use 
30 of the computer program and determining homology between the cDNA codes and 
reference nucleotide sequences. 
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[0255] Alternatively, the computer program may be a computer program 

which compares the nucleotide sequences of the cDNA codes of the present invention, to 
reference nucleotide sequences in order to determine whether the cDNA code of SEQ ID 
NOs. 40-84 and 130-154 differs from a reference nucleic acid sequence at one or more 
positions. Optionally such a program records the length and identity of inserted, deleted or 
substituted nucleotides with respect to the sequence of either the reference polynucleotide 
or the cDNA code of SEQ ID NOs. 40-84 and 130-154. In one embodiment, the 
computer program may be a program which determines whether the nucleotide sequences 
of the cDNA codes of SEQ ID NOs. 40-84 and 130-154 contain a single nucleotide 
polymorphism (SNP) with respect to a reference nucleotide sequence. This single 
nucleotide polymorphism may comprise a single base substitution, insertion, or deletion. 

[0256] Another aspect of the present invention is a method for determining 

the level of homology between a polypeptide code of SEQ ID NOS. 85-129 and 155-179 
and a reference polypeptide sequence, comprising the steps of reading the polypeptide 
code of SEQ ID NOS. 85-129 and 155-179 and the reference polypeptide sequence 
through use of a computer program which determines homology levels and determining 
homology between the polypeptide code and the reference polypeptide sequence using the 
computer program. 

[0257] Accordingly, another aspect of the present invention is a method for 

determining whether a cDNA code of SEQ ID NOs. 40-84 and 130-154 differs at one or 
more nucleotides from a reference nucleotide sequence comprising the steps of reading the 
cDNA code and the reference nucleotide sequence through use of a computer program 
which identifies differences between nucleic acid sequences and identifying differences 
between the cDNA code and the reference nucleotide sequence with the computer 
program. In some embodiments, the computer program is a program which identifies 
single nucleotide polymorphisms. The method may be implemented by the computer 
systems described above. The method may also be performed by reading at least 2, 5, 10, 
15, 20, 25, 30, or 50 of the cDNA codes of SEQ ID NOs. 40-84 and 130-154 and the 
reference nucleotide sequences through the use of the computer program and identifying 
differences between the cDNA codes and the reference nucleotide sequences with the 
computer program. 
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[0258] In other embodiments the computer based system may further 

comprise an identifier for identifying features within the nucleotide sequences of the 
cDNA codes of SEQ ID NOs. 40-84 and 130-154 or the amino acid sequences of the 
polypeptide codes of SEQ ID NOS. 85-129 and 155-179. 

[0259] An "identifier" refers to one or more programs which identifies 

certain features within the above-described nucleotide sequences of the cDNA codes of 
SEQ ID NOs. 40-84 and 130-154 or the amino acid sequences of the polypeptide codes 
of SEQ ID NOS. 85-129 and 155-179. In one embodiment, the identifier may comprise 
a program which identifies an open reading frame in the cDNAs codes of SEQ ID NOs. 
40-84 and 130-154. 

[0260] In another embodiment, the identifier may comprise a molecular 

modeling program which determines the 3-dimensional structure of the polypeptides 
codes of SEQ ID NOS. 85-129 and 155-179. In some embodiments, the molecular 
modeling program identifies target sequences that are most compatible with profiles 
representing the structural environments of the residues in known three-dimensional 
protein structures. (See, e.g., Eisenberg et al., U.S. Patent No. 5,436,850 issued July 25, 
1995). In another technique, the known three-dimensional structures of proteins in a 
given family are superimposed to define the structurally conserved regions in that 
family. This protein modeling technique also uses the known three-dimensional 
structure of a homologous protein to approximate the structure of the polypeptide codes 
of SEQ ID NOS. 85-129 and 155-179. (See e.g., Srinivasan, et al., U.S. Patent 
No. 5,557,535 issued September 17, 1996). Conventional homology modeling 
techniques have been used routinely to build models of proteases and antibodies. 
(Sowdhamini et al., Protein Engineering 10:207, 215 (1997)). Comparative approaches 
can also be used to develop three-dimensional protein models when the protein of 
interest has poor sequence identity to template proteins. In some cases, proteins fold 
into similar three-dimensional structures despite having very weak sequence identities. 
For example, the three-dimensional structures of a number of helical cytokines fold in 
similar three-dimensional topology in spite of weak sequence homology. 

[0261] The recent development of threading methods now enables the 

identification of likely folding patterns in a number of situations where the structural 
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relatedness between target and template(s) is not detectable at the sequence level. 
Hybrid methods, in which fold recognition is performed using Multiple Sequence 
Threading (MST), structural equivalencies are deduced from the threading output using 
a distance geometry program DRAGON to construct a low resolution model, and a full- 
atom representation is constructed using a molecular modeling package such as 
QUANTA. 

[0262] According to this 3-step approach, candidate templates are first 

identified by using the novel fold recognition algorithm MST, which is capable of 
performing simultaneous threading of multiple aligned sequences onto one or more 3-D 
structures. In a second step, the structural equivalencies obtained from the MST output 
are converted into interresidue distance restraints and fed into the distance geometry 
program DRAGON, together with auxiliary information obtained from secondary 
structure predictions. The program combines the restraints in an unbiased manner and 
rapidly generates a large number of low resolution model confirmations. In a third step, 
these low resolution model confirmations are converted into full-atom models and 
subjected to energy minimization using the molecular modeling package QUANTA. 
(See e.g., Aszodi et al, Proteins: Structure, Function, and Genetics, Supplement 1:38-42 
(1997)). 

[0263] The results of the molecular modeling analysis may then be used 

in rational drug design techniques to identify agents which modulate the activity of the 
polypeptide codes of SEQ ID NOS. 85-129 and 155-179. 

[0264] Accordingly, another aspect of the present invention is a method 

of identifying a feature within the cDNA codes of SEQ ID NOs. 40-84 and 130-154 or 
the polypeptide codes of SEQ ID NOS. 85-129 and 155-179 comprising reading the 
cDNA code(s) or the polypeptide code(s) through the use of a computer program which 
identifies features therein and identifying features within the cDNA code(s) or 
polypeptide code(s) with the computer program. In one embodiment, computer program 
comprises a computer program which identifies open reading frames. In a further 
embodiment, the computer program identifies structural motifs in a polypeptide 
sequence. In another embodiment, the computer program comprises a molecular 
modeling program. The method may be performed by reading a single sequence or at 
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least 2, 5, 10, 15, 20, 25, 30, or 50 of the cDNA codes of SEQ ID NOs. 40-84 and 130- 
154 or the polypeptide codes of SEQ ID NOS. 85-129 and 155-179 through the use of 
the computer program and identifying features within the cDNA codes or polypeptide 
codes with the computer program. 

5 [0265] The cDNA codes of SEQ ID NOs. 40-84 and 130-154 or the 

polypeptide codes of SEQ ID NOS. 85-129 and 155-179 maybe stored and manipulated 
in a variety of data processor programs in a variety of formats. For example, the cDNA 
codes of SEQ ID NOs. 40-84 and 130-154 or the polypeptide codes of SEQ ID NOS. 
85-129 and 155-179 may be stored as text in a word processing file, such as 

10 MicrosoftWORD or WORDPERFECT or as an ASCII file in a variety of database 
programs familiar to those of skill in the art, such as DB2, SYBASE, or ORACLE. In 
addition, many computer programs and databases may be used as sequence comparers, 
identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to 
the cDNA codes of SEQ ID NOs. 40-84 and 130-154 or the polypeptide codes of SEQ 

15 ED NOS. 85-129 and 155-179. The following list is intended not to limit the invention but 
to provide guidance to programs and databases which are useful with the cDNA codes of 
SEQ ID NOs. 40-84 and 130-154 or the polypeptide codes of SEQ ID NOS. 85-129 and 
155-179. The programs and databases which may be used include, but are not limited to: 
MacPattern (EMBL), DiscoveryBase (Molecular Applications Group), GeneMine 

20 (Molecular Applications Group), Look (Molecular Applications Group), MacLook 

(Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX 
(Altschul et al, J. Mol. Biol. 215: 403 (1990)), FASTA (Pearson and Lipman, Proc. Natl. 
Acad. Sci. USA, 85: 2444 (1988)), FASTDB (Brutlag et al. Comp. App. Biosci. 6:237- 
245, 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular 

25 Simulations Inc.), Cerius 2 .DBAccess (Molecular Simulations Inc.), HypoGen (Molecular 

Simulations Inc.), Insight II, (Molecular Simulations Inc.), Discover (Molecular 
Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular Simulations 
Inc.), DelPhi, (Molecular Simulations Inc.), QuanteMM, (Molecular Simulations Inc.), 
Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), ISIS 

30 (Molecular Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc.), 
WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
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Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular 
Simulations Inc.), the EMBL/Swissprotein database, the MDL Available Chemicals 
Directory database, the MDL Drug Data Report data base, the Comprehensive Medicinal 
Chemistry database, Derwents's World Drug Index database, the BioByteMasterFile 
database, the Genbank database, and the Genseqn database. Many other programs and 
data bases would be apparent to one of skill in the art given the present disclosure. 

[0266] Motifs which may be detected using the above programs include 

sequences encoding leucine zippers, helix-turn-helix motifs, glycosylation sites, 
ubiquitination sites, alpha helices, and beta sheets, signal sequences encoding signal 
peptides which direct the secretion of the encoded proteins, sequences implicated in 
transcription regulation such as homeoboxes, acidic stretches, enzymatic active sites, 
substrate binding sites, and enzymatic cleavage sites. 

5. Selection of Cloned Full Length Sequences of t he Present Invention 

[0267] Cloned full length extended cDNA sequences that have already 

been characterized by the aforementioned computer analysis are then submitted to an 
automatic procedure in order to preselect full length extended cDNAs containing 
sequences of interest, 
a) Automatic sequence preselection 

[0268] All complete cloned full length extended cDNAs clipped for vector 

on both ends are considered. First, a negative selection is operated in order to eliminate 
unwanted cloned sequences resulting from either contaminants or PCR artifacts as follows. 
Sequences matching contaminant sequences such as vector RNA, tRNA, mtRNA, rRNA 
sequences are discarded as well as those encoding ORF sequences exhibiting extensive 
homology to repeats as defined in section 4 a). Sequences obtained by direct cloning using 
nested primers on 5' and 3' tags (section 1. case a) but lacking polyA tail are discarded. 
Only ORFs containing a signal peptide and ending either before the polyA tail (case a) or 
before the end of the cloned 3'UTR (case b) are kept. Then, ORFs containing unlikely 
mature proteins such as mature proteins which size is less than 20 amino acids or less than 
25% of the immature protein size are eliminated. 



-67- 



36.US3.DIV 



[0269] In the selection of the ORF, priority was given to the ORF and the 

frame corresponding to the polypeptides described in SignalTag Patents (United States 
Patent Application Serial Nos: 08/905,223; 08/905,135; 08/905,051; 08/905,144; 
08/905,279; 08/904,468; 08/905,134; and 08/905,133). If the ORF was not found among 
the ORFs described in the SignalTag Patents, the ORF encoding the signal peptide with 
the highest score according to Von Heijne method as defined in Example 22 was chosen. 
If the scores were identical, then the longest ORF was chosen. 

[0270] Sequences of full length extended cDNA clones are then compared 

pairwise with BLAST after masking of the repeat sequences. Sequences containing at 
least 90% homology over 30 nucleotides are clustered in the same class. Each cluster is 
then subjected to a cluster analysis that detects sequences resulting from internal priming 
or from alternative splicing, identical sequences or sequences with several frameshifts. 
This automatic analysis serves as a basis for manual selection of the sequences. 

b) Manual sequence selection 

[0271] Manual selection can be carried out using automatically 

generated reports for each sequenced full length extended cDNA clone. During this 
manual procedure, a selection is operated between clones belonging to the same class as 
follows. ORF sequences encoded by clones belonging to the same class are aligned and 
compared. If the homology between nucleotidic sequences of clones belonging to the 
same class is more than 90% over 30 nucleotide stretches or if the homology between 
amino acid sequences of clones belonging to the same class is more than 80% over 20 
amino acid stretches, than the clones are considered as being identical. The chosen 
ORF is the best one according to the criteria mentioned below. If the nucleotide and 
amino acid homologies are less than 90% and 80% respectively, the clones are said to 
encode distinct proteins which can be both selected if they contain sequences of interest. 

[0272] Selection of full length extended cDNA clones encoding sequences 

of interest is performed using the following criteria. Structural parameters (initial tag, 
polyadenylation site and signal) are first checked. Then, homologies with known nucleic 
acids and proteins are examined in order to determine whether the clone sequence match a 
known nucleic/proteic sequence and, in the latter case, its covering rate and the date at 
which the sequence became public. If there is no extensive match with sequences other 
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than ESTs or genomic DNA, or if the clone sequence brings substantial new information, 
such as encoding a protein resulting from alternative slicing of an mRNA coding for an 
already known protein, the sequence is kept. Examples of such cloned full length 
extended cDNAs containing sequences of interest are described in Example 28. 
5 Sequences resulting from chimera or double inserts as assessed by homology to other 
sequences are discarded during this procedure. 

EXAMPLE 28 

Cloning and Sequencing of Extended cDNAs 

10 [0273] The procedure described in Example 27 above was used to obtain 

the extended cDNAs of the present invention. Using this approach, the full length cDNA 
of SEQ ID NO: 17 was obtained. This cDNA falls into the "EST-ext" category described 
above and encodes the signal peptide MKKVLLLITAILAVAVG (SEQ ID NO: 18) 
having a von Heijne score of 8.2. 

15 [0274] The foil length cDNA of SEQ ID NO: 19 was also obtained using 

this procedure. This cDNA falls into the "EST-ext" category described above and encodes 
the signal peptide MWWFQQGLSFLPSALVIWTSA (SEQ ID NO:20) having a von 
Heijne score of 5.5. 

[0275] Another full length cDNA obtained using the procedure described 

20 above has the sequence of SEQ ID NO:21. This cDNA, falls into the "EST-ext" category 
described above and encodes the signal peptide 

MVLTTLPSANSANSPVNMPTTGPNSLSYASSALSPCLT (SEQ ED NO:22) having a 
von Heijne score of 5.9. 

[0276] The above procedure was also used to obtain a full length cDNA 

25 having the sequence of SEQ ID NO:23. This cDNA falls into the "EST-ext" category 

described above and encodes the signal peptide ILSTVTALTFAXA (SEQ ID NO:24) 
having a von Heijne score of 5.5. 
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[0277] The full length cDNA of SEQ ID NO:25 was also obtained using 

this procedure. This cDNA falls into the "new" category described above and encodes a 
signal peptide LVLTLCTLPLAVA (SEQ ID NO:26) having a von Heijne score of 10.1 . 

[0278] The full length cDNA of SEQ ID NO:27 was also obtained using 

5 this procedure. This cDNA falls into the "new" category described above and encodes a 
signal peptide LWLLFFLVTAIHA (SEQ ID NO:28) having a von Heijne score of 10.7. 

[0279] The above procedures were also used to obtain the extended 

cDNAs of the present invention. 5' ESTs expressed in a variety of tissues were obtained 
as described above. The appended sequence listing provides the tissues from which the 
10 extended cDNAs were obtained. It will be appreciated that the extended cDNAs may also 
be expressed in tissues other than the tissue listed in the sequence listing. 

[0280] 5' ESTs obtained as described above were used to obtain extended 

cDNAs having the sequences of SEQ ID NOs: 40-84 and 130-154. Table IV provides the 
sequence identification numbers of the extended cDNAs of the present invention, the 

15 locations of the full coding sequences in SEQ ID NOs: 40-84 and 130-154 (i.e. the 
nucleotides encoding both the signal peptide and the mature protein, listed under the 
heading FCS location in Table IV), the locations of the nucleotides in SEQ ID NOs: 40-84 
and 130-154 which encode the signal peptides (listed under the heading SigPep Location 
in Table IV), the locations of the nucleotides in SEQ ID NOs: 40-84 and 130-154 which 

20 encode the mature proteins generated by cleavage of the signal peptides (listed under the 
heading Mature Polypeptide Location in Table IV), the locations in SEQ ED NOs: 40-84 
and 130-154 of stop codons (listed under the heading Stop Codon Location in Table IV), 
the locations in SEQ ID NOs: 40-84 and 130-154 of polyA signals (listed under the 
heading Poly A Signal Location in Table IV) and the locations of polyA sites (listed under 

25 the heading Poly A Site Location in Table IV). 

[0281] The polypeptides encoded by the extended cDNAs were screened 

for the presence of known structural or functional motifs or for the presence of signatures, 
small amino acid sequences which are well conserved amongst the members of a protein 
family. The conserved regions have been used to derive consensus patterns or matrices 
30 included in the PROSITE data bank, in particular in the file prosite.dat (Release 13.0 of 
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November 1995, located at http://expasy.hcuge.ch/sprot/prosite.html. Prosite_convert and 
prosite_scan programs (http://ulrec3.unil.ch/ftpserveur/prosite_scan) were used to find 
signatures on the extended cDNAs. 

[0282] For each pattern obtained with the prosite__convert program from 

5 the prosite.dat file, the accuracy of the detection on a new protein sequence has been tested 

by evaluating the frequency of irrelevant hits on the population of human secreted proteins 
included in the data bank SWISSPROT. The ratio between the number of hits on shuffled 
proteins (with a window size of 20 amino acids) and the number of hits on native 
(unshuffled) proteins was used as an index. Every pattern for which the ration was greater 

1 o than 20% (one hit on shuffled proteins for 5 hits on native proteins) was skipped during the 

search with prosite_scan. The program used to shuffle protein sequences (db_shuffled) 
and the program used to determine the statistics for each pattern in the protein data banks 
(prosite__statistics) are available on the ftp site http://ukec3.unilxh/ftpserveur/prosite_scan. 

[0283] Table V lists the sequence identification numbers of the 

15 polypeptides of SEQ ID NOs: 85-129 and 155-179, the locations of the amino acid 

residues of SEQ ID NOs: 85-129 and 155-179 in the full length polypeptide (second 
column), the locations of the amino acid residues of SEQ ID NOs: 85-129 and 155-179 in 
the signal peptides (third column), and the locations of the amino acid residues of SEQ ID 
NOs: 85-129 and 155-179 in the mature polypeptide created by cleaving the signal peptide 
20 from the full length polypeptide (fourth column). 

[0284] The nucleotide sequences of the sequences of SEQ ID NOs: 40-84 

and 130-154 and the amino acid sequences encoded by SEQ ID NOs: 40-84 and 130-154 
(i.e. amino acid sequences of SEQ ID NOs: 85-129 and 155-179) are provided in the 
appended sequence listing. In some instances, the sequences are preliminary and may 

25 include some incorrect or ambiguous sequences or amino acids. The sequences of SEQ ID 
NOs: 40-84 and 130-154 can readily be screened for any errors therein and any sequence 
ambiguities can be resolved by resequencing a fragment containing such errors or 
ambiguities on both strands. Sequences containing such errors will generally be at least 
95%, at least 96%, at least 97%, at least 98%, or at least 99% homologous to the sequences 

30 of SEQ ID Nos. 85-129 and 155-179 and such sequences are included in the nucleic acids 

and polypeptides of the present invention. Nucleic acid fragments for resolving 
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sequencing errors or ambiguities may be obtained from the deposited clones or can be 
isolated using the techniques described herein. Resolution of any such ambiguities or 
errors may be facilitated by using primers which hybridize to sequences located close to 
the ambiguous or erroneous sequences. For example, the primers may hybridize to 
sequences within 50-75 bases of the ambiguity or error. Upon resolution of an error or 
ambiguity, the corresponding corrections can be made in the protein sequences encoded by 
the DNA containing the error or ambiguity. The amino acid sequence of the protein 
encoded by a particular clone can also be determined by expression of the clone in a 
suitable host cell, collecting the protein, and determining its sequence. 

[0285] For each amino acid sequence, Applicants have identified what 

they have determined to be the reading frame best identifiable with sequence information 
available at the time of filing. Some of the amino acid sequences may contain "Xaa" 
designators. These "Xaa" designators indicate either (1) a residue which cannot be 
identified because of nucleotide sequence ambiguity or (2) a stop codon in the determined 
sequence where Applicants believe one should not exist (if the sequence were determined 
more accurately). 

[0286] Cells containing the extended cDNAs (SEQ ID NOs: 40-84 and 

130-154) of the present invention in the vector pED6dpc2, are maintained in permanent 
deposit by the inventors at Genset, S.A., 24 Rue Royale, 75008 Paris, France. 

[0287] Pools of cells containing the extended cDNAs (SEQ ID NOs: 40- 

84), from which cells containing a particular polynucleotide are obtainable, were deposited 
with the American Type Culture Collection (ATCC), 10801 University Blvd., Manassas, 
VA, U.S.A., 20110-2209. Each extended cDNA clone has been transfected into separate 
bacterial cells (E-coli) for this composite deposit. Table VI lists the deposit numbers of the 
clones of SEQ ID Nos: 40-84. A pool of cells designated SignalTag 28011999, which 
contains the clones of SEQ ID NOs 71-84 was mailed to the European Collection of Cell 
Cultures, (ECACC) Vaccine Research and Production Laboratory, Public Health 
Laboratory Service, Centre for Applied Microbiology and Research, Porton Down, 
Salisbury, Wiltshire SP4 OJG, United Kingdom on January 28, 1999 and was received 
on January 29, 1999. This pool of cells has the ECACC Accession # XXXXXX. One 
or more pools of cells containing the extended cDNAs of SEQ ID Nos: 130-154, from 



-72- 



36.US3.DIV 

which the cells containing a particular polynucleotide is obtainable, will be deposited with 
the European Collection of Cell Cultures, Vaccine Research and Production 
Laboratory, Public Health Laboratory Service, Centre for Applied Microbiology and 
Research, Porton Down, Salisbury, Wiltshire SP4 OJG, United Kingdom and will be 
5 assigned ECACC deposit number XXXXXXX. Table VII provides the internal 

designation number assigned to each SEQ ID NO. and indicates whether the sequence is a 
nucleic acid sequence or a protein sequence. 

[0288] Each extended cDNA can be removed from the pED6dpc2 vector 

in which it was deposited by performing a NotI, PstI double digestion to produce the 
10 appropriate fragment for each clone. The proteins encoded by the extended cDNAs may 

also be expressed from the promoter in pED6dpc2. 

[0289] Bacterial cells containing a particular clone can be obtained from 

the composite deposit as follows: 

[0290] An oligonucleotide probe or probes should be designed to the 

15 sequence that is known for that particular clone. This sequence can be derived from the 

sequences provided herein, or from a combination of those sequences. The design of the 
oligonucleotide probe should preferably follow these parameters: 

(a) It should be designed to an area of the sequence which has the fewest 
ambiguous bases ("N's"), if any; 

20 (b) Preferably, the probe is designed to have a T m of approx. 80°C 

(assuming 2 degrees for each A or T and 4 degrees for each G or C). 
However, probes having melting temperatures between 40 °C and 80 °C 
may also be used provided that specificity is not lost. 

[0291] The oligonucleotide should preferably be labeled with 

25 (-[ 32 P]ATP (specific activity 6000 Ci/mmole) and T4 polynucleotide kinase using 

commonly employed techniques for labeling oligonucleotides. Other labeling techniques 
can also be used. Unincorporated label should preferably be removed by gel filtration 
chromatography or other established methods. The amount of radioactivity incorporated 
into the probe should be quantified by measurement in a scintillation counter. Preferably, 
30 specific activity of the resulting probe should be approximately 4X1 0 6 dpm/pmole. 
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[0292] The bacterial culture containing the pool of full-length clones 

should preferably be thawed and 100 pi of the stock used to inoculate a sterile culture flask 
containing 25 ml of sterile L-broth containing ampicillin at 100 ug/ml. The culture should 
preferably be grown to saturation at 37°C, and the saturated culture should preferably be 
5 diluted in fresh L-broth. Aliquots of these dilutions should preferably be plated to 

determine the dilution and volume which will yield approximately 5000 distinct and well- 
separated colonies on solid bacteriological media containing L-broth containing ampicillin 
at 100 jag/ml and agar at 1.5% in a 150 mm petri dish when grown overnight at 37°C. 
Other known methods of obtaining distinct, well-separated colonies can also be employed. 

10 [0293] Standard colony hybridization procedures should then be used to 

transfer the colonies to nitrocellulose filters and lyse, denature and bake them. 

[0294] The filter is then preferably incubated at 65°C for 1 hour with 

gentle agitation in 6XSSC (20X stock is 175.3 g NaCl/liter, 88.2 g Na citrate/liter, 
adjusted to pH7.0 with NaOH) containing 0.5% SDS, lOOpg/ml of yeast RNA, and 

15 10 mM EDTA (approximately 10 mL per 150 mm filter). Preferably, the probe is then 

added to the hybridization mix at a concentration greater than or equal to 1X1 0 6 dpm/mL. 
The filter is then preferably incubated at 65°C with gentle agitation overnight. The filter is 
then preferably washed in 500 mL of 2X SSC/0.1% SDS at room temperature with gentle 
shaking for 15 minutes. A third wash with 0.1X SSC/0.5% SDS at 65°C for 30 minutes to 

20 1 hour is optional. The filter is then preferably dried and subjected to autoradiography for 

sufficient time to visualize the positives on the X-ray film. Other known hybridization 
methods can also be employed. 

[0295] The positive colonies are picked, grown in culture, and plasmid 

DNA isolated using standard procedures. The clones can then be verified by restriction 
25 analysis, hybridization analysis, or DNA sequencing. 

[0296] The plasmid DNA obtained using these procedures may then be 

manipulated using standard cloning techniques familiar to those skilled in the art. 
Alternatively, a PCR can be done with primers designed at both ends of the extended 
cDNA insertion. For example, a PCR reaction may be conducted using a primer having 
30 the sequence GGCCATACACTTGAGTGAC (SEQ ID NO:38) and a primer having the 
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sequence ATATAGACAAACGCACACC (SEQ. ID. NO:39). The PCR product which 
corresponds to the extended cDNA can then be manipulated using standard cloning 
techniques familiar to those skilled in the art. 

[0297] In addition to PCR based methods for obtaining extended cDNAs, 

traditional hybridization based methods may also be employed. These methods may also 
be used to obtain the genomic DNAs which encode the mRNAs from which the 5' ESTs 
were derived, mRNAs corresponding to the extended cDNAs, or nucleic acids which are 
homologous to extended cDNAs or 5' ESTs. Example 29 below provides an example of 
such methods. 



EXAMPLE 29 

Methods for Obtaining Extended cDNAs or Nucleic 

Acids Homologous to Extended cDNAs or 5' ESTs 

[0298] 5 'ESTs or extended cDNAs of the present invention may also be 

used to isolate extended cDNAs or nucleic acids homologous to extended cDNAs from a 
cDNA library or a genomic DNA library. Such cDNA library or genomic DNA library 
may be obtained from a commercial source or made using other techniques familiar to 
those skilled in the art. One example of such cDNA library construction is as follows. 

[0299] PolyA+ RNAs are prepared and their quality checked as described 

in Example 13. Then, polyA+ RNAs are ligated to an oligonucleotide tag using either the 
chemical or enzymatic methods described in above sections 1 and 2. In both cases, the 
oligonucleotide tag may contain a restriction site such as Eco RI to facilitate further 
subcloning procedures. Northern blotting is then performed to check the size of ligatured 
mRNAs and to ensure that the mRNAs were actually tagged. 

[0300] As described in Example 14, first strand synthesis is subsequently 

carried out for mRNAs joined to the oligonucleotide tag replacing the random nonamers 
by an oligodT primer. For instance, this oligodT primer may contain an internal tag of 4 
nucleotides which is different from one tissue to the other. Alternatively, the 
oligonucleotide of SEQ ID NO:14 may be used. Following second strand synthesis using 
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a primer contained in the oligonucleotide tag attached to the 5' end of mRNA, the blunt 
ends of the obtained double stranded full length DNAs are modified into cohesive ends to 
allow subcloning into the Eco RI and Hind III sites of a Bluescript vector using the 
addition of a Hind III adaptor to the 3' end of full length DNAs. 

[0301] The extended full length DNAs are then separated into several 

fractions according to their sizes using techniques familiar to those skilled in the art. For 
example, electrophoretic separation may be applied in order to yield 3 or 6 different 
fractions. Following gel extraction and purification, the DNA fractions are subcloned into 
Bluescript vectors, transformed into competent bacteria and propagated under appropriate 
antibiotic conditions. 

[0302] Such full length cDNA libraries may then be sequenced as follows 

or used in screening procedures to obtain nucleic acids homologous to extended cDNAs or 
5' ESTs as described below. 

[0303] The 5' end of extended cDNA isolated from the full length cDNA 

libraries or of nucleic acid homologous thereto may then be sequenced as described in 
example 27. In a first step, the sequence corresponding to the 5' end of the mRNA is 
obtained. If this sequence either corresponds to a SignalTag™ 5'EST or fulfills the criteria 
to be one, the cloned insert is subcloned into an appropriate vector such as pED6dpc2, 
double-sequenced and submitted to the analysis and selection procedures described in 
Example 27. 

[0304] Such cDNA or genomic DNA libraries may be used to isolate 

extended cDNAs obtained from 5' EST or nucleic acids homologous to extended cDNAs 
or 5' EST as follows. The cDNA library or genomic DNA library is hybridized to a 
detectable probe comprising at least 10 consecutive nucleotides from the 5' EST or 
extended cDNA using conventional techniques. Preferably, the probe comprises at least 
12, 15, or 17 consecutive nucleotides from the 5' EST or extended cDNA. More 
preferably, the probe comprises at least 20 to 30 consecutive nucleotides from the 5' EST 
or extended cDNA. In some embodiments, the probe comprises at least 40, at least 50, at 
least 75, at least 100, at least 150, or at least 200 consecutive nucleotides from the 5' EST 
or extended cDNA. [0302] Techniques for identifying cDNA clones in a cDNA library 
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which hybridize to a given probe sequence are disclosed in Sambrook et al, Molecular 
Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989, the 
disclosure of which is incorporated herein by reference. The same techniques may be used 
to isolate genomic DNAs. 
5 [0305] Briefly, cDNA or genomic DNA clones which hybridize to the 

detectable probe are identified and isolated for further manipulation as follows. A probe 
comprising at least 10 consecutive nucleotides from the 5' EST or extended cDNA is 
labeled with a detectable label such as a radioisotope or a fluorescent molecule. 
Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the 5' 
[0 EST or extended cDNA. More preferably, the probe comprises 20 to 30 consecutive 

nucleotides from the 5' EST or extended cDNA. In some embodiments, the probe 
comprises at least 40, at least 50, at least 75, at least 100, at least 150, or at least 200 
consecutive nucleotides from the 5' EST or extended cDNA. [0304] Techniques 

for labeling the probe are well known and include phosphorylation with polynucleotide 
15 kinase, nick translation, in vitro transcription, and non-radioactive techniques. The cDNAs 

or genomic DNAs in the library are transferred to a nitrocellulose or nylon filter and 
denatured. After blocking of non-specific sites, the filter is incubated with the labeled 
probe for an amount of time sufficient to allow binding of the probe to cDNAs or genomic 
DNAs containing a sequence capable of hybridizing thereto. 
20 [0306] By varying the stringency of the hybridization conditions used to 

identify extended cDNAs or genomic DNAs which hybridize to the detectable probe, 
extended cDNAS having different levels of homology to the probe can be identified and 
isolated as described below. 

1. Identification of Extended cDNA or Genomic DNA Sequences Having a High 
25 Degree of Homology to the Labeled Probe 

[0307] To identify extended cDNAs or genomic DNAs having a high 

degree of homology to the probe sequence, the melting temperature of the probe may be 
calculated using the following formulas: 
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[0308] For probes between 14 and 70 nucleotides in length the melting 

temperature (Tm) is calculated using the formula: Tmr=8l.5+l6.6Q.og 
[Na+])+0,41(fraction G+C)-(600/N) where N is the length of the probe, 

[0309] If the hybridization is carried out in a solution containing 

5 formamide, the melting temperature may be calculated using the equation 

Tm=81.5+16.6(log [Na+])+0.41 (fraction G+C)-(0.63% formamide)-(600/N) where N is 
the length of the probe. 

[0310] Prehybridization may be carried out in 6X SSC, 5X Denhardfs 

reagent, 0.5% SDS, 100 \xg denatured fragmented salmon sperm DNA or 6X SSC, 5X 
10 Denhardfs reagent, 0.5% SDS, 100 |ug denatured fragmented salmon sperm DNA, 50% 
formamide. The formulas for SSC and Denhardfs solutions are listed in Sambrook et al 9 
supra. 

[0311] Hybridization is conducted by adding the detectable probe to the 

prehybridization solutions listed above. Where the probe comprises double stranded 

15 DNA, it is denatured before addition to the hybridization solution. The filter is contacted 

with the hybridization solution for a sufficient period of time to allow the probe to 
hybridize to extended cDNAs or genomic DNAs containing sequences complementary 
thereto or homologous thereto. For probes over 200 nucleotides in length, the 
hybridization may be carried out at 15-25°C below the Tm. For shorter probes, such as 

20 oligonucleotide probes, the hybridization may be conducted at 15-25°C below the Tm. 

Preferably, for hybridizations in 6X SSC, the hybridization is conducted at approximately 
68°C. Preferably, for hybridizations in 50% formamide containing solutions, the 
hybridization is conducted at approximately 42°C. 

[0312] All of the foregoing hybridizations would be considered to be under 

25 "stringent" conditions. 

[0313] Following hybridization, the filter is washed in 2X SSC, 0.1% SDS 

at room temperature for 15 minutes. The filter is then washed with 0.1X SSC, 0.5% SDS 
at room temperature for 30 minutes to 1 hour. Thereafter, the solution is washed at the 
hybridization temperature in 0.1X SSC, 0.5% SDS. A final wash is conducted in 0.1X 
30 SSC at room temperature. 
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[0314] Extended cDNAs, nucleic acids homologous to extended cDNAs or 

5' ESTs, or genomic DNAs which have hybridized to the probe are identified by 
autoradiography or other conventional techniques. 

5 2. Obtaining Extended cDNA or Genomic DNA Sequences Having Lower Degrees 
of Homology to the Labeled Probe 

[0315] The above procedure may be modified to identify extended 

cDNAs, nucleic acids homologous to extended cDNAs, or genomic DNAs having 
decreasing levels of homology to the probe sequence. For example, to obtain extended 

10 cDNAs, nucleic acids homologous to extended cDNAs, or genomic DNAs of decreasing 
homology to the detectable probe, less stringent conditions may be used. For example, the 
hybridization temperature may be decreased in increments of 5°C from 68°C to 42°C in a 
hybridization buffer having a sodium concentration of approximately 1M. Following 
hybridization, the filter may be washed with 2X SSC, 0.5% SDS at the temperature of 

15 hybridization. These conditions are considered to be "moderate" conditions above 50°C 
and "low" conditions below 50°C. 

[0316] Alternatively, the hybridization may be carried out in buffers, such 

as 6X SSC, containing formamide at a temperature of 42°C. In this case, the concentration 
of formamide in the hybridization buffer may be reduced in 5% increments from 50% to 
20 0% to identify clones having decreasing levels of homology to the probe. Following 
hybridization, the filter may be washed with 6X SSC, 0.5% SDS at 50°C. These conditions 
are considered to be "moderate" conditions above 25% formamide and "low" conditions 
below 25% formamide. 

[0317] Extended cDNAs, nucleic acids homologous to extended cDNAs, 

25 or genomic DNAs which have hybridized to the probe are identified by autoradiography. 

3. Determination of the Degree of Homology between the Obtained Extended 
cDNAs or Genomic DNAs and the Labeled Probe 

[031 8] To determine the level of homology between the hybridized nucleic 

acid and the extended cDNA or 5'EST from which the probe was derived, the nucleotide 
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sequences of the hybridized nucleic acid and the extended cDNA or 5'EST from which the 
probe was derived are compared. The sequences of the extended cDNA or 5'EST and the 
homologous sequences may be stored on a computer readable medium as described in 
Example 17 above and may be compared using any of a variety of algorithms familiar to 
5 those skilled in the art. For example, if it is desired to obtain nucleic acids homologous to 
extended cDNAs, such as allelic variants thereof or nucleic acids encoding proteins related 
to the proteins encoded by the extended cDNAs, the level of homology between the 
hybridized nucleic acid and the extended cDNA or 5' EST used as the probe may be 
determined using algorithms such as BLAST2N; parameters may be adapted depending on 
10 the sequence length and degree of homology studied. For example, the default parameters 
or the parameters in Table I and II may be used to determine homology levels. 

[0319] Alternatively, the level of homology between the hybridized 

nucleic acid and the extended cDNA or 5'EST from which the probe was derived may be 
determined using the FASTDB algorithm described in Brutlag et al. Comp. App. Biosci. 
15 6:237-245, 1990. In such analyses the parameters may be selected as follows: 

Matrix=Unitary, k-tuple=4 ? Mismatch Penalty=l, Joining Penalty=30, Randomization 
Group Length^O, Cutoff Score=l, Gap Penalty^S, Gap Size Penalty^O.05, Window 
Size-500 or the length of the sequence which hybridizes to the probe, whichever is 
shorter. Because the FASTDB program does not consider 5' or 3 ; truncations when 
20 calculating homology levels, if the sequence which hybridizes to the probe is truncated 
relative to the sequence of the extended cDNA or 5'EST from which the probe was derived 
the homology level is manually adjusted by calculating the number of nucleotides of the 
extended cDNA or 5'EST which are not matched or aligned with the hybridizing sequence, 
determining the percentage of total nucleotides of the hybridizing sequence which the non- 
25 matched or non-aligned nucleotides represent, and subtracting this percentage from the 
homology level. For example, if the hybridizing sequence is 700 nucleotides in length 
and the extended cDNA sequence is 1000 nucleotides in length wherein the first 300 bases 
at the 5' end of the extended cDNA are absent from the hybridizing sequence, and wherein 
the overlapping 700 nucleotides are identical, the homology level would be adjusted as 
30 follows. The non-matched, non-aligned 300 bases represent 30% of the length of the 
extended cDNA. If the overlapping 700 nucleotides are 100% identical, the adjusted 
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homology level would be 100-30=70% homology. It should be noted that the preceding 
adjustments are only made when the non-matched or non-aligned nucleotides are at the 
5' or J ends. No adjustments are made if the non-matched or non-aligned sequences are 
internal or under any other conditions. 

5 [0320] For example, using the above methods, nucleic acids having at least 

95% nucleic acid homology, at least 96% nucleic acid homology, at least 97% nucleic acid 
homology, at least 98% nucleic acid homology, at least 99% nucleic acid homology, or 
more than 99% nucleic acid homology to the extended cDNA or 5'EST from which the 
probe was derived may be obtained and identified. Such nucleic acids may be allelic 
10 variants or related nucleic acids from other species. Similarly, by using progressively less 
stringent hybridization conditions one can obtain and identify nucleic acids having at least 
90%, at least 85%, at least 80% or at least 75% homology to the extended cDNA or 5 f EST 
from which the probe was derived. 

[0321] To determine whether a clone encodes a protein having a given 

15 amount of homology to the protein encoded by the extended cDNA or 5' EST, the amino 
acid sequence encoded by the extended cDNA or 5' EST is compared to the amino acid 
sequence encoded by the hybridizing nucleic acid. The sequences encoded by the 
extended cDNA or 5'EST and the sequences encoded by the homologous sequences may 
be stored on a computer readable medium as described in Example 17 above and may be 

20 compared using any of a variety of algorithms familiar to those skilled in the art. 

Homology is determined to exist when an amino acid sequence in the extended cDNA or 
5' EST is closely related to an amino acid sequence in the hybridizing nucleic acid. A 
sequence is closely related when it is identical to that of the extended cDNA or 5' EST or 
when it contains one or more amino acid substitutions therein in which amino acids having 

25 similar characteristics have been substituted for one another. Using the above methods 
and algorithms such as FASTA with parameters depending on the sequence length and 
degree of homology studied, for example the default parameters or the parameters in Table 
I and II, one can obtain nucleic acids encoding proteins having at least 99%, at least 98%, 
at least 97%, at least 96%, at least 95%, at least 90%, at least 85%, at least 80% or at least 

30 75% homology to the proteins encoded by the extended cDNA or 5'EST from which the 

probe was derived. In some embodiments, the homology levels can be determined using 
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the "default" opening penalty and the "default" gap penalty, and a scoring matrix such 
as PAM 250 (a standard scoring matrix; see Dayhoff et al., in: Atlas of Protein 
Sequence and Structure, Vol 5, Supp. 3 (1978)). 

[0322] Alternatively, the level of homology may be determined using the 

5 FASTDB algorithm described by Brutlag et al Comp. App. Biosci. 6:237-245, 1990. In 
such analyses the parameters may be selected as follows: MatrixHPAM 0, k-tuple=2, 
Mismatch Penalty^ 1 ? Joining Penalty=20, Randomization Group Length^O, Cutoff 
Score=l, Window Size^Sequence Length, Gap Penalty=5, Gap Size Penalty=0.05, 
Window Size^SOO or the length of the homologous sequence, whichever is shorter. If the 

10 homologous amino acid sequence is shorter than the amino acid sequence encoded by the 
extended cDNA or 5'EST as a result of an N terminal and/or C terminal deletion the results 
may be manually corrected as follows. First, the number of amino acid residues of the 
amino acid sequence encoded by the extended cDNA or 5'EST which are not matched or 
aligned with the homologous sequence is determined, Then, the percentage of the length 

15 of the sequence encoded by the extended cDNA or 5'EST which the non-matched or non- 
aligned amino acids represent is calculated. This percentage is subtracted from the 
homology level. For example wherein the amino acid sequence encoded by the extended 
cDNA or 5'EST is 100 amino acids in length and the length of the homologous sequence is 
80 amino acids and wherein the amino acid sequence encoded by the extended cDNA or 

20 5'EST is truncated at the N terminal end with respect to the homologous sequence, the 
homology level is calculated as follows. In the preceding scenario there are 20 non- 
matched, non-aligned amino acids in the sequence encoded by the extended cDNA or 
5'EST. This represents 20% of the length of the amino acid sequence encoded by the 
extended cDNA or 5'EST. If the remaining amino acids are 1005 identical between the 

25 two sequences, the homology level would be 100%-20%=80% homology. No 
adjustments are made if the non-matched or non-aligned sequences are internal or under 
any other conditions. 

[0323] In addition to the above described methods, other protocols are 

available to obtain extended cDNAs using 5' ESTs as outlined in the following 
30 paragraphs. 
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[0324] Extended cDNAs may be prepared by obtaining mRNA from the 

tissue, cell, or organism of interest using mRNA preparation procedures utilizing polyA 
selection procedures or other techniques known to those skilled in the art. A first primer 
capable of hybridizing to the polyA tail of the mRNA is hybridized to the mRNA and a 
5 reverse transcription reaction is performed to generate a first cDNA strand. 

[0325] The first cDNA strand is hybridized to a second primer containing 

at least 10 consecutive nucleotides of the sequences of the 5' EST for which an extended 
cDNA is desired. Preferably, the primer comprises at least 12, 15, or 17 consecutive 
nucleotides from the sequences of the 5' EST. More preferably, the primer comprises 20 

10 to 30 consecutive nucleotides from the sequences of the 5 f EST. In some embodiments, 
the primer comprises more than 30 nucleotides from the sequences of the 5' EST. If it is 
desired to obtain extended cDNAs containing the full protein coding sequence, including 
the authentic translation initiation site, the second primer used contains sequences located 
upstream of the translation initiation site. The second primer is extended to generate a 

15 second cDNA strand complementary to the first cDNA strand. Alternatively, RT-PCR 
may be performed as described above using primers from both ends of the cDNA to be 
obtained. 

[0326] Extended cDNAs containing 5' fragments of the mRNA may be 

prepared by hybridizing an mRNA comprising the sequence of the 5' EST for which an 
20 extended cDNA is desired with a primer comprising at least 10 consecutive nucleotides of 
the sequences complementary to the 5' EST and reverse transcribing the hybridized primer 
to make a first cDNA strand from the mRNAs. Preferably, the primer comprises at least 
12, 15, or 17 consecutive nucleotides from the 5' EST. More preferably, the primer 
comprises 20 to 30 consecutive nucleotides from the 5' EST. 

25 [0327] Thereafter, a second cDNA strand complementary to the first 

cDNA strand is synthesized. The second cDNA strand may be made by hybridizing a 
primer complementary to sequences in the first cDNA strand to the first cDNA strand and 
extending the primer to generate the second cDNA strand. 

[0328] The double stranded extended cDNAs made using the methods 

30 described above are isolated and cloned. The extended cDNAs may be cloned into vectors 



-83- 



36.US3.DIV 

such as plasmids or viral vectors capable of replicating in an appropriate host cell For 
example, the host cell may be a bacterial, mammalian, avian, or insect cell. 

[0329] Techniques for isolating mRNA, reverse transcribing a primer 

hybridized to mRNA to generate a first cDNA strand, extending a primer to make a second 

5 cDNA strand complementary to the first cDNA strand, isolating the double stranded 
cDNA and cloning the double stranded cDNA are well known to those skilled in the art 
and are described in Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 
1997 and Sambrook et ah, Molecular Cloning: A Laboratory Manual, Second Edition, 
Cold Spring Harbor Laboratory Press, 1989, the entire disclosures of which are 

1 0 incorporated herein by reference. 

[0330] Alternatively, other procedures may be used for obtaining full 

length cDNAs or extended cDNAs. In one approach, full length or extended cDNAs are 
prepared from mRNA and cloned into double stranded phagemids as follows. The cDNA 
library in the double stranded phagemids is then rendered single stranded by treatment 

15 with an endonuclease, such as the Gene II product of the phage Fl, and an exonuclease 
(Chang et al 7 Gene 127:95-8, 1993). A biotinylated oligonucleotide comprising the 
sequence of a 5' EST, or a fragment containing at least 10 nucleotides thereof, is 
hybridized to the single stranded phagemids. Preferably, the fragment comprises at least 
12, 15, or 17 consecutive nucleotides from the 5' EST. More preferably, the fragment 

20 comprises 20-30 consecutive nucleotides from the 5' EST. In some procedures, the 
fragment may comprise at least 40, at least 50, at least 75, at least 100, at least 150, or at 
least 200 conscutive nucleotides from the 5' EST. 

[0331] Hybrids between the biotinylated oligonucleotide and phagemids 

having inserts containing the 5' EST sequence are isolated by incubating the hybrids with 
25 streptavidin coated paramagnetic beads and retrieving the beads with a magnet (Fry et aL 9 

Biotechniques, 13: 124-131, 1992). Thereafter, the resulting phagemids containing the 5' 
EST sequence are released from the beads and converted into double stranded DNA using 
a primer specific for the 5' EST sequence. Alternatively, protocols such as the Gene 
Trapper kit (Gibco BRL) may be used. The resulting double stranded DNA is transformed 
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into bacteria. Extended cDNAs containing the 5 r EST sequence are identified by colony 
PCR or colony hybridization. 

[0332] Using any of the above described methods in section III, a plurality 

of extended cDNAs containing full length protein coding sequences or sequences encoding 
5 only the mature protein remaining after the signal peptide is cleaved off may be provided 

as cDNA libraries for subsequent evaluation of the encoded proteins or use in diagnostic 
assays as described below. 

IV. Expression of Proteins Encoded by Extended cDNAs Isolated Using 5' ESTs 

10 [0333] Extended cDNAs containing the full protein coding sequences of 

their corresponding mRNAs or portions thereof, such as cDNAs encoding the mature 
protein, may be used to express the secreted proteins or portions thereof which they encode 
as described in Example 30 below. If desired, the extended cDNAs may contain the 
sequences encoding the signal peptide to facilitate secretion of the expressed protein. It 

15 will be appreciated that a plurality of extended cDNAs containing the full protein coding 
sequences or portions thereof may be simultaneously cloned into expression vectors to 
create an expression library for analysis of the encoded proteins as described below. 
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EXAMPLE 30 

Expression of the Proteins Encoded by Extended cDNAs or Portions Thereof 

[0334] To express the proteins encoded by the extended cDNAs or 

portions thereof, nucleic acids containing the coding sequence for the proteins or portions 
5 thereof to be expressed are obtained as described in Examples 27-29 and cloned into a 
suitable expression vector. If desired, the nucleic acids may contain the sequences 
encoding the signal peptide to facilitate secretion of the expressed protein. For example, 
the nucleic acid may comprise the sequence of one of SEQ ID NOs: 40-84 and 130-154 
listed in Table IV and in the accompanying sequence listing. Alternatively, the nucleic 
10 acid may comprise those nucleotides which make up the full coding sequence of one of the 

sequences of SEQ ID NOs: 40-84 and 130-154 as defined in Table IV above. 

[0335] It will be appreciated that should the extent of the full coding 

sequence (i.e. the sequence encoding the signal peptide and the mature protein resulting 
from cleavage of the signal peptide) differ from that listed in Table IV as a result of a 

15 sequencing error, reverse transcription or amplification error, mRNA splicing, post- 

translational modification of the encoded protein, enzymatic cleavage of the encoded 
protein, or other biological factors, one skilled in the art would be readily able to identify 
the extent of the full coding sequences in the sequences of SEQ ID NOs. 40-84 and 130- 
154. Accordingly, the scope of any claims herein relating to nucleic acids containing the 

20 fall coding sequence of one of SEQ ID NOs. 40-84 and 130-154 is not to be construed as 
excluding any readily identifiable variations from or equivalents to the full coding 
sequences listed in Table IV Similarly, should the extent of the fall length polypeptides 
differ from those indicated in Table V as a result of any of the preceding factors, the scope 
of claims relating to polypeptides comprising the amino acid sequence of the fall length 

25 polypeptides is not to be construed as excluding any readily identifiable variations from or 

equivalents to the sequences listed in Table V. 

[0336] Alternatively, the nucleic acid used to express the protein or portion 

thereof may comprise those nucleotides which encode the mature protein (i.e. the protein 
created by cleaving the signal peptide off) encoded by one of the sequences of SEQ ID 
30 NOs: 40-84 and 1 30-1 54 as defined in Table IV above. 
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[0337] It will be appreciated that should the extent of the sequence 

encoding the mature protein differ from that listed in Table IV as a result of a sequencing 
error, reverse transcription or amplification error, mRNA splicing, post-translational 
modification of the encoded protein, enzymatic cleavage of the encoded protein, or other 
5 biological factors, one skilled in the art would be readily able to identify the extent of the 

sequence encoding the mature protein in the sequences of SEQ E) NOs. 40-84 and 130- 
154. Accordingly, the scope of any claims herein relating to nucleic acids containing the 
sequence encoding the mature protein encoded by one of SEQ ED Nos, 40-84 and 130-154 
is not to be construed as excluding any readily identifiable variations from or equivalents 

10 to the sequences listed in Table IV. Thus, claims relating to nucleic acids containing the 
sequence encoding the mature protein encompass equivalents to the sequences listed in 
Table IV, such as sequences encoding biologically active proteins resulting from post- 
translational modification, enzymatic cleavage, or other readily identifiable variations from 
or equivalents to the secreted proteins in addition to cleavage of the signal peptide. 

15 Similarly, should the extent of the mature polypeptides differ from those indicated in Table 

V as a result of any of the preceding factors, the scope of claims relating to polypeptides 
comprising the sequence of a mature protein included in the sequence of one of SEQ ID 
NOs. 85-129 and 155-179 is not to be construed as excluding any readily identifiable 
variations from or equivalents to the sequences listed in Table V. Thus, claims relating to 

20 polypeptides comprising the sequence of the mature protein encompass equivalents to the 
sequences listed in Table IV, such as biologically active proteins resulting from post- 
translational modification, enzymatic cleavage, or other readily identifiable variations from 
or equivalents to the secreted proteins in addition to cleavage of the signal peptide. It will 
also be appreciated that should the biologically active form of the polypeptides included in 

25 the sequence of one of SEQ ID NOs. 85-129 and 155-179 or the nucleic acids encoding 

the biologically active form of the polypeptides differ from those identified as the mature 
polypeptide in Table V or the nucleotides encoding the mature polypeptide in Table IV as 
a result of a sequencing error, reverse transcription or amplification error, mRNA splicing, 
post-translational modification of the encoded protein, enzymatic cleavage of the encoded 

30 protein, or other biological factors, one skilled in the art would be readily able to identify 
the amino acids in the biologically active form of the polypeptides and the nucleic acids 
encoding the biologically active form of the polypeptides. In such instances, the claims 
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relating to polypetides comprising the mature protein included in one of SEQ ID NOs. 85- 
129 and 155-179 or nucleic acids comprising the nucleotides of one of SEQ ID NOs. 40- 
84 and 130-154 encoding the mature protein shall not be construed to exclude any readily 
identifiable variations from the sequences listed in Table IV and Table V. 

5 [0338] In some embodiments, the nucleic acid used to express the protein 

or portion thereof may comprise those nucleotides which encode the signal peptide 
encoded by one of the sequences of SEQ ID NOs: 40-84 and 130-154 as defined in Table 
IV above. 

[0339] It will be appreciated that should the extent of the sequence 

10 encoding the signal peptide differ from that listed in Table IV as a result of a sequencing 

error, reverse transcription or amplification error, mRNA splicing, post-translational 
modification of the encoded protein, enzymatic cleavage of the encoded protein, or other 
biological factors, one skilled in the art would be readily able to identify the extent of the 
sequence encoding the signal peptide in the sequences of SEQ ID NOs, 40-84 and 130- 
15 154. Accordingly, the scope of any claims herein relating to nucleic acids containing the 

sequence encoding the signal peptide encoded by one of SEQ ID Nos. 40-84 and 130-154 
is not to be construed as excluding any readily identifiable variations from the sequences 
listed in Table IV. Similarly, should the extent of the signal peptides differ from those 
indicated in Table V as a result of any of the preceding factors, the scope of claims relating 
20 to polypeptides comprising the sequence of a signal peptide included in the sequence of 
one of SEQ ID NOs. 85-129 and 155-179 is not to be construed as excluding any readily 
identifiable variations from the sequences listed in Table V. 

[0340] Alternatively, the nucleic acid may encode a polypeptide 

comprising at least 10 consecutive amino acids of one of the sequences of SEQ ID NOs: 

25 85-129 and 155-179. In some embodiments, the nucleic acid may encode a polypeptide 
comprising at least 15 consecutive amino acids of one of the sequences of SEQ ID NOs: 
85-129 and 155-179. In other embodiments, the nucleic acid may encode a polypeptide 
comprising at least 25 consecutive amino acids of one of the sequences of SEQ ID NOs: 
85-129 and 155-179. In other embodiments, the nucleic acid may encode a polypeptide 

30 comprising at least 60, at least 75, at least 100 or more than 100 consecutive amino acids 

of one of the sequences of SEQ ID Nos: 85-129 and 155-179. 
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[0341] The nucleic acids inserted into the expression vectors may also 

contain sequences upstream of the sequences encoding the signal peptide, such as 
sequences which regulate expression levels or sequences which confer tissue specific 
expression. 

5 [0342] The nucleic acid encoding the protein or polypeptide to be 

expressed is operably linked to a promoter in an expression vector using conventional 
cloning technology. The expression vector may be any of the mammalian, yeast, insect or 
bacterial expression systems known in the art. Commercially available vectors and 
expression systems are available from a variety of suppliers including Genetics Institute 

10 (Cambridge, MA), Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and 
Invitrogen (San Diego, California). If desired, to enhance expression and facilitate proper 
protein folding, the codon context and codon pairing of the sequence may be optimized for 
the particular expression organism in which the expression vector is introduced, as 
explained by Hatfield, et al, U.S. Patent No. 5,082,767, incorporated herein by this 

15 reference. 

[0343] The following is provided as one exemplary method to express the 

proteins encoded by the extended cDNAs corresponding to the 5' ESTs or the nucleic 
acids described above. First, the methionine initiation codon for the gene and the poly A 
signal of the gene are identified. If the nucleic acid encoding the polypeptide to be 

20 expressed lacks a methionine to serve as the initiation site, an initiating methionine can be 

introduced next to the first codon of the nucleic acid using conventional techniques. 
Similarly, if the extended cDNA lacks a poly A signal, this sequence can be added to the 
construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using 
Bgll and Sail restriction endonuclease enzymes and incorporating it into the mammalian 

25 expression vector pXTl (Stratagene). pXTl contains the LTRs and a portion of the gag 
gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct 
allow efficient stable transfection. The vector includes the Herpes Simplex Thymidine 
Kinase promoter and the selectable neomycin gene. The extended cDNA or portion 
thereof encoding the polypeptide to be expressed is obtained by PCR from the bacterial 

30 vector using oligonucleotide primers complementary to the extended cDNA or portion 

thereof and containing restriction endonuclease sequences for Pst I incorporated into the 
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5'primer and Bglll at the 5' end of the corresponding cDNA 3' primer, taking care to 
ensure that the extended cDNA is positioned in frame with the poly A signal The purified 
fragment obtained from the resulting PCR reaction is digested with PstI, blunt ended with 
an exonuclease, digested with Bgl II, purified and ligated to pXTl ? now containing a poly 
5 A signal and digested with BgllL 

[0344] The ligated product is transfected into mouse NIH 3T3 cells using 

Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in 
the product specification. Positive transfectants are selected after growing the transfected 
cells in 600ug/ml G418 (Sigma, St. Louis, Missouri). Preferably the expressed protein is 
10 released into the culture medium, thereby facilitating purification. 

[0345] Alternatively, the extended cDNAs may be cloned into pED6dpc2 

as described above. The resulting pED6dpc2 constructs may be transfected into a suitable 
host cell, such as COS 1 cells. Methotrexate resistant cells are selected and expanded. 
Preferably, the protein expressed from the extended cDNA is released into the culture 
1 5 medium thereby facilitating purification. 

[0346] Proteins in the culture medium are separated by gel electrophoresis. 

If desired, the proteins may be ammonium sulfate precipitated or separated based on size 
or charge prior to electrophoresis, 

[0347] As a control, the expression vector lacking a cDNA insert is 

20 introduced into host cells or organisms and the proteins in the medium are harvested. The 
secreted proteins present in the medium are detected using techniques such as Coomassie 
or silver staining or using antibodies against the protein encoded by the extended cDNA. 
Coomassie and silver staining techniques are familiar to those skilled in the art. 

[0348] Antibodies capable of specifically recognizing the protein of 

25 interest may be generated using synthetic 15-mer peptides having a sequence encoded by 

the appropriate 5' EST, extended cDNA, or portion thereof. The synthetic peptides are 
injected into mice to generate antibody to the polypeptide encoded by the 5' EST, extended 
cDNA, or portion thereof. 

[0349] Secreted proteins from the host cells or organisms containing an 

30 expression vector which contains the extended cDNA derived from a 5' EST or a portion 
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thereof are compared to those from the control cells or organism. The presence of a band 
in the medium from the cells containing the expression vector which is absent in the 
medium from the control cells indicates that the extended cDNA encodes a secreted 
protein. Generally, the band corresponding to the protein encoded by the extended cDNA 
5 will have a mobility near that expected based on the number of amino acids in the open 

reading frame of the extended cDNA. However, the band may have a mobility different 
than that expected as a result of modifications such as glycosylation, ubiquitination, or 
enzymatic cleavage. 

[0350] Alternatively, if the protein expressed from the above expression 

10 vectors does not contain sequences directing its secretion, the proteins expressed from host 
cells containing an expression vector containing an insert encoding a secreted protein or 
portion thereof can be compared to the proteins expressed in host cells containing the 
expression vector without an insert. The presence of a band in samples from cells 
containing the expression vector with an insert which is absent in samples from cells 
15 containing the expression vector without an insert indicates that the desired protein or 

portion thereof is being expressed. Generally, the band will have the mobility expected for 
the secreted protein or portion thereof. However, the band may have a mobility different 
than that expected as a result of modifications such as glycosylation, ubiquitination, or 
enzymatic cleavage. 

20 [0351] The protein encoded by the extended cDNA may be purified using 

standard immunochromatography techniques. In such procedures, a solution containing 
the secreted protein, such as the culture medium or a cell extract, is applied to a column 
having antibodies against the secreted protein attached to the chromatography matrix. The 
secreted protein is allowed to bind the immunochromatography column. Thereafter, the 

25 column is washed to remove non-specifically bound proteins. The specifically bound 
secreted protein is then released from the column and recovered using standard techniques. 

[0352] If antibody production is not possible, the extended cDNA 

sequence or portion thereof may be incorporated into expression vectors designed for use 
in purification schemes employing chimeric polypeptides. In such strategies the coding 
30 sequence of the extended cDNA or portion thereof is inserted in frame with the gene 
encoding the other half of the chimera. The other half of the chimera may be (5-globin or a 
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nickel binding polypeptide encoding sequence. A chromatography matrix having antibody 
to p-globin or nickel attached thereto is then used to purify the chimeric protein. Protease 
cleavage sites may be engineered between the {5-globin gene or the nickel binding 
polypeptide and the extended cDNA or portion thereof. Thus, the two polypeptides of the 
5 chimera may be separated from one another by protease digestion. 

[0353] One useful expression vector for generating P-globin chimerics is 

pSG5 (Stratagene), which encodes rabbit p-globin. Intron II of the rabbit p-globin gene 
facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated 
into the construct increases the level of expression. These techniques as described are well 

10 known to those skilled in the art of molecular biology. Standard methods are published in 
methods texts such as Davis et al, (Basic Methods in Molecular Biology, L.G. Davis, 
M.D. Dibner, and J.F. Battey, ed., Elsevier Press, NY, 1986) and many of the methods are 
available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may 
additionally be produced from the construct using in vitro translation systems such as the 

15 In vitro Express™ Translation Kit (Stratagene). 

[0354] Following expression and purification of the secreted proteins 

encoded by the 5' ESTs, extended cDNAs, or fragments thereof, the purified proteins may 
be tested for the ability to bind to the surface of various cell types as described in Example 
31 below. It will be appreciated that a plurality of proteins expressed from these cDNAs 
20 may be included in a panel of proteins to be simultaneously evaluated for the activities 
specifically described below, as well as other biological roles for which assays for 
determining activity are available. 

EXAMPLE 31 

25 Analysis of Secreted Proteins to Determine Whether they Bind to the Cel l Surface 

[0355] The proteins encoded by the 5' ESTs, extended cDNAs, or 

fragments thereof are cloned into expression vectors such as those described in Example 
30. The proteins are purified by size, charge, immunochromatography or other techniques 
familiar to those skilled in the art. Following purification, the proteins are labeled using 
30 techniques known to those skilled in the art. The labeled proteins are incubated with cells 
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or cell lines derived from a variety of organs or tissues to allow the proteins to bind to any 
receptor present on the cell surface. Following the incubation, the cells are washed to 
remove non-specifically bound protein. The labeled proteins are detected by 
autoradiography. Alternatively, unlabeled proteins may be incubated with the cells and 
5 detected with antibodies having a detectable label, such as a fluorescent molecule, attached 
thereto. 

[0356] Specificity of cell surface binding may be analyzed by conducting a 

competition analysis in which various amounts of unlabeled protein are incubated along 
with the labeled protein. The amount of labeled protein bound to the cell surface decreases 
10 as the amount of competitive unlabeled protein increases. As a control, various amounts 
of an unlabeled protein unrelated to the labeled protein is included in some binding 
reactions. The amount of labeled protein bound to the cell surface does not decrease in 
binding reactions containing increasing amounts of unrelated unlabeled protein, indicating 
that the protein encoded by the cDNA binds specifically to the cell surface. 

15 [0357] As discussed above, secreted proteins have been shown to have a 

number of important physiological effects and, consequently, represent a valuable 
therapeutic resource. The secreted proteins encoded by the extended cDNAs or portions 
thereof made according to Examples 27-29 may be evaluated to determine their 
physiological activities as described below. 

20 

EXAMPLE 32 

Assaying the Proteins Expressed from Extended cDNAs or Portions Thereof fo r Cytokine, 
Cell Proliferation or Cell Differentiation Activity 

[0358] As discussed above, secreted proteins may act as cytokines or may 

25 affect cellular proliferation or differentiation. Many protein factors discovered to date, 

including all known cytokines, have exhibited activity in one or more factor dependent cell 
proliferation assays, and hence the assays serve as a convenient confirmation of cytokine 
activity. The activity of a protein of the present invention is evidenced by any one of a 
number of routine factor dependent cell proliferation assays for cell lines including, 
30 without limitation, 32D, DA2, DA1G, T10, B9, B9/11, BaF3, MC9/G, M+ (preB M+), 
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2E8, RB5, DAI, 123, Tl 165, HT2, CTLL2, TF-1, Mo7c and CMK. The proteins encoded 
by the above extended cDNAs or portions thereof may be evaluated for their ability to 
regulate T cell or thymocyte proliferation in assays such as those described above or in the 
following references, which are incorporated herein by reference: Current Protocols in 
5 Immunology, Ed. by J.E. Coligan et al, Greene Publishing Associates and Wiley- 

Interscience; Takai et al. J. Immunol. 137:3494-3500, 1986. Bertagnolli et al. J. 
Immunol 145:1706-1712, 1990. Bertagnolli et al, Cellular Immunology 133:327-341, 
1991. Bertagnolli, et al. J. Immunol. 149:3778-3783, 1992; Bowman et al., J. Immunol. 
152:1756-1761, 1994. 

10 [0359] In addition, numerous assays for cytokine production and/or the 

proliferation of spleen cells, lymph node cells and thymocytes are known. These include 
the techniques disclosed in Current Protocols in Immunology. J.E. Coligan et al. Eds., 
Vol 1 pp. 3.12.1-3.12.14 John Wiley and Sons, Toronto. 1994; and Schreiber, R.D. 
Current Protocols in Immunology., supra Vol 1 pp. 6.8.1-6.8.8, John Wiley and Sons, 

15 Toronto. 1994. 

[0360] The proteins encoded by the cDNAs may also be assayed for the 

ability to regulate the proliferation and differentiation of hematopoietic or lymphopoietic 
cells. Many assays for such activity are familiar to those skilled in the art, including the 
assays in the following references, which are incorporated herein by reference: Bottomly, 

20 K., Davis, L.S. and Lipsky, P.R, Measurement of Human and Murine Interleukin 2 and 

Interleukin 4, Current Protocols in Immunology., J.E. Coligan et al. Eds. Vol 1 pp. 
6.3.1-6.3.12, John Wiley and Sons, Toronto. 1991; deVries et al., J, Exp. Med. 173:1205- 
1211, 1991; Moreau et al, Nature 36:690-692, 1988; Greenberger et al., Proc. NatL 
Acad. ScL U.S.A. 80:2931-2938, 1983; Nordan, R., Measurement of Mouse and Human 

25 Interleukin 6 Current Protocols in Immunology. J.E. Coligan et al. Eds. Vol 1 pp. 6.6.1- 
6.6.5, John Wiley and Sons, Toronto. 1991; Smith et al, Proc. NatL Acad. Sci. U.S.A. 
83:1857-1861, 1986; Bennett, F., Giannotti, J., Clark, S.C. and Turner, K.J., Measurement 
of Human Interleukin 1 1 Current Protocols in Immunology. J.E. Coligan et al. Eds. Vol 
1 pp. 6.15.1 John Wiley and Sons, Toronto. 1991; Ciarletta, A., Giannotti, J., Clark, S.C. 

30 and Turner, KJ., Measurement of Mouse and Human Interleukin 9 Current Protocols in 
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Immunology. J.E. Coligan et al., Eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 
1991. 

[0361] The proteins encoded by the cDNAs may also be assayed for their 

ability to regulate T-cell responses to antigens. Many assays for such activity are familiar 

5 to those skilled in the art, including the assays described in the following references, which 

are incorporated herein by reference: Chapter 3 (In Vitro Assays for Mouse Lymphocyte 
Function), Chapter 6 (Cytokines and Their Cellular Receptors) and Chapter 7, 
(Immunologic Studies in Humans) in Current Protocols in Immunology, J.E. Coligan et 
al. Eds. Greene Publishing Associates and Wiley-Interscience; Weinberger et al., Proc. 

10 Natl. Acad* ScL USA 77:6091-6095, 1980; Weinberger et al., Eur. J. Immun. 11:405- 
411, 1981; Takai et al., J. Immunol. 137:3494-3500, 1986; Takai et al, J. Immunol. 
140:508-512, 1988. 

[0362] Those proteins which exhibit cytokine, cell proliferation, or cell 

differentiation activity may then be formulated as pharmaceuticals and used to treat 
15 clinical conditions in which induction of cell proliferation or differentiation is beneficial. 
Alternatively, as described in more detail below, genes encoding these proteins or nucleic 
acids regulating the expression of these proteins may be introduced into appropriate host 
cells to increase or decrease the expression of the proteins as desired. 

20 EXAMPLE 33 

Assaying the Proteins Expressed from Extended cDNAs or Portions 

Thereof for Activity as Immune System Regulators 

[0363 1] The proteins encoded by the cDNAs may also be evaluated for their 
effects as immune regulators. For example, the proteins may be evaluated for their activity 

25 to influence thymocyte or splenocyte cytotoxicity. Numerous assays for such activity are 

familiar to those skilled in the art including the assays described in the following 
references, which are incorporated herein by reference: Chapter 3 (In Vitro Assays for 
Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic studies in Humans) 
in Current Protocols in Immunology, J.E. Coligan et al. Eds, Greene Publishing 

30 Associates and Wiley-Interscience; Herrmann et al, Proc, Natl. Acad, Sci. USA 78:2488- 
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2492, 1981; Herrmann et al, J. Immunol. 128:1968-1974, 1982; Handa et al, J. 
Immunol 135:1564-1572, 1985; Takai et al., J. Immunol 137:3494-3500, 1986; Takai et 
al., J. Immunol. 140:508-512, 1988; Herrmann et al., Proc. NatL Acad. Scl USA 
78:2488-2492, 1981; Herrmann et al, J. Immunol 128:1968-1974, 1982; Handa et al., J. 
5 Immunol 135:1564-1572, 1985; Takai et al, J. Immunol 137:3494-3500, 1986; 
Bowman et al., J. Virology 61:1992-1998; Takai et al., J. Immunol 140:508-512, 1988; 
Bertagnolli et al, Cellular Immunology 133:327-341, 1991; Brown et al., J. Immunol 
153:3079-3092, 1994. 

[0364] The proteins encoded by the cDNAs may also be evaluated for their 

10 effects on T-cell dependent iinmunoglobulin responses and isotype switching. Numerous 

assays for such activity are familiar to those skilled in the art, including the assays 
disclosed in the following references, which are incorporated herein by reference: 
Maliszewski, J. Immunol 144:3028-3033, 1990; Mond, J J. and Brunswick, M Assays 
for B Cell Function: In vitro Antibody Production, Vol 1 pp. 3.8.1-3.8.16 in Current 
15 Protocols in Immunology. IE. Coligan et al Eds., John Wiley and Sons, Toronto. 1994. 

[0365] The proteins encoded by the cDNAs may also be evaluated for their 

effect on immune effector cells, including their effect on Thl cells and cytotoxic 
lymphocytes. Numerous assays for such activity are familiar to those skilled in the art, 
including the assays disclosed in the following references, which are incorporated herein 
20 by reference: Chapter 3 (In Vitro Assays for Mouse Lymphocyte Function 3.1-3.19) and 
Chapter 7 (Immunologic Studies in Humans) in Current Protocols in Immunology, J.E, 
Coligan et al. Eds., Greene Publishing Associates and Wiley-Interscience; Takai et al., J* 
Immunol 137:3494-3500, 1986; Takai et al.; J. Immunol 140:508-512, 1988; 
Bertagnolli et al, J. Immunol 149:3778-3783, 1992. 

25 [0366] The proteins encoded by the cDNAs may also be evaluated for their 

effect on dendritic cell mediated activation of naive T-cells. Numerous assays for such 
activity are familiar to those skilled in the art, including the assays disclosed in the 
following references, which are incorporated herein by reference: Query et al, J, 
Immunol 134:536-544, 1995; Inaba et al., Journal of Experimental Medicine 173:549- 

30 559, 1991; Macatonia et al, Journal of Immunology 154:5071-5079, 1995; Porgador et 
al, Journal of Experimental Medicine 182:255-260, 1995; Nair et al, Journal of 
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Virology 67:4062-4069, 1993; Huang et aL, Science 264:961-965, 1994; Macatonia et aL, 
Journal of Experimental Medicine 169:1255-1264, 1989; Bhardwaj et ah, Journal of 
Clinical Investigation 94:797-807, 1994; and Inaba et ah, Journal of Experimental 
Medicine 172:631-640, 1990. 

5 [0367] The proteins encoded by the cDNAs may also be evaluated for their 

influence on the lifetime of lymphocytes. Numerous assays for such activity are familiar to 
those skilled in the art, including the assays disclosed in the following references, which 
are incorporated herein by reference: Darzynkiewicz et aL, Cytometry 13:795-808, 1992; 
Gorczyca et aL, Leukemia 7:659-670, 1993; Gorczyca et aL, Cancer Research 53:1945- 
10 1951, 1993; Itoh et aL, Cell 66:233-243, 1991; Zacharchuk, Journal of Immunology 
145:4037-4045, 1990; Zamai et aL, Cytometry 14:891-897, 1993; Gorczyca et aL, 
International Journal of Oncology 1:639-648, 1992, 

[0368] Assays for proteins that influence early steps of T-cell commitment 

and development include, without limitation, those described in: Antica et aL, Blood 
15 84:111-117, 1994; Fine et aL, Cellular immunology 155:111-122, 1994; Galy et aL, Blood 

85:2770-2778, 1995; Toki et aL, Proc. Nat. Acad Sci. USA 88:7548-7551, 1991. 

[0369] Those proteins which exhibit activity as immune system regulators 

activity may then be formulated as pharmaceuticals and used to treat clinical conditions in 
which regulation of immune activity is beneficial. For example, the protein may be useful 

20 in the treatment of various immune deficiencies and disorders (including severe combined 

immunodeficiency (SCID)), e.g., in regulating (up or down) growth and proliferation of T 
and/or B lymphocytes, as well as effecting the cytolytic activity of NK cells and other cell 
populations. These immune deficiencies may be genetic or be caused by viral (e.g., HIV) 
as well as bacterial or fungal infections, or may result from autoimmune disorders. More 

25 specifically, infectious diseases caused by viral, bacterial, fungal or other infection may be 
treatable using a protein of the present invention, including infections by HIV, hepatitis 
viruses, herpesviruses, mycobacteria, Leishmania spp., malaria spp. and various fungal 
infections such as candidiasis. Of course, in this regard, a protein of the present invention 
may also be useful where a boost to the immune system generally may be desirable, i.e., in 

30 the treatment of cancer. 
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[0370] Autoimmune disorders which may be treated using a protein of the 

present invention include, for example, connective tissue disease, multiple sclerosis, 
systemic lupus erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, 
Guillain-Barre syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, 
myasthenia gravis, graft-versus-host disease and autoimmune inflammatory eye disease. 
Such a protein of the present invention may also to be useful in the treatment of allergic 
reactions and conditions, such as asthma (particularly allergic asthma) or other respiratory 
problems. Other conditions, in which immune suppression is desired (including, for 
example, organ transplantation), may also be treatable using a protein of the present 
invention. 

[0371] Using the proteins of the invention it may also be possible to 

regulate immune responses, in a number of ways. Down regulation may be in the form of 
inhibiting or blocking an immune response already in progress or may involve preventing 
the induction of an immune response. The functions of activated T-cells may be inhibited 
by suppressing T cell responses or by inducing specific tolerance in T cells, or both. 
Immunosuppression of T cell responses is generally an active, non-antigen-specific, 
process which requires continuous exposure of the T cells to the suppressive agent. 
Tolerance, which involves inducing non-responsiveness or anergy in T cells, is 
distinguishable from immunosuppression in that it is generally antigen-specific and 
persists after exposure to the tolerizing agent has ceased. Operationally, tolerance can be 
demonstrated by the lack of a T cell response upon reexposure to specific antigen in the 
absence of the tolerizing agent. 

[0372] Down regulating or preventing one or more antigen functions 

(including without limitation B lymphocyte antigen functions (such as, for example, B7)), 
e.g., preventing high level lymphokine synthesis by activated T cells, will be useful in 
situations of tissue, skin and organ transplantation and in graft-versus-host disease 
(GVHD). For example, blockage of T cell function should result in reduced tissue 
destruction in tissue transplantation. Typically, in tissue transplants, rejection of the 
transplant is initiated through its recognition as foreign by T cells, followed by an immune 
reaction that destroys the transplant. The administration of a molecule which inhibits or 
blocks interaction of a B7 lymphocyte antigen with its natural ligand(s) on immune cells 
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(such as a soluble, monomelic form of a peptide having B7-2 activity alone or in 
conjunction with a monomelic form of a peptide having an activity of another B 
lymphocyte antigen (e.g., B7-1, B7-3) or blocking antibody), prior to transplantation can 
lead to the binding of the molecule to the natural ligand(s) on the immune cells without 
5 transmitting the corresponding costimulatory signal. Blocking B lymphocyte antigen 

function in this matter prevents cytokine synthesis by immune cells, such as T cells, and 
thus acts as an immunosuppressant. Moreover, the lack of costimulation may also be 
sufficient to anergize the T cells, thereby inducing tolerance in a subject. Induction of 
long-term tolerance by B lymphocyte antigen-blocking reagents may avoid the necessity 
10 of repeated administration of these blocking reagents. To achieve sufficient 

immunosuppression or tolerance in a subject, it may also be necessary to block the 
function of a combination of B lymphocyte antigens. 

[0373] The efficacy of particular blocking reagents in preventing organ 

transplant rejection or GVHD can be assessed using animal models that are predictive of 

15 efficacy in humans. Examples of appropriate systems which can be used include 

allogeneic cardiac grafts in rats and xenogeneic pancreatic islet cell grafts in mice, both of 
which have been used to examine the immunosuppressive effects of CTLA4Ig fusion 
proteins in vivo as described in Lenschow et al., Science 257:789-792 (1992) and Turka et 
al., Proc. Natl. Acad. Sci USA, 89:11102-11105 (1992). In addition, murine models of 

20 GVHD (see Paul ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 846- 

847) can be used to determine the effect of blocking B lymphocyte antigen function in 
vivo on the development of that disease. 

[0374] Blocking antigen function may also be therapeutically useful for 

treating autoimmune diseases. Many autoimmune disorders are the result of inappropriate 

25 activation of T cells that are reactive against self tissue and which promote the production 

of cytokines and autoantibodies involved in the pathology of the diseases. Preventing the 
activation of autoreactive T cells may reduce or eliminate disease symptoms. 
Administration of reagents which block costimulation of T cells by disrupting receptor 
ligand interactions of B lymphocyte antigens can be used to inhibit T cell activation and 

30 prevent production of autoantibodies or T cell-derived cytokines which may be involved in 
the disease process. Additionally, blocking reagents may induce antigen-specific tolerance 
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of autoreactive T cells which could lead to long-term relief from the disease. The efficacy 
of blocking reagents in preventing or alleviating autoimmune disorders can be determined 
using a number of well-characterized animal models of human autoimmune diseases. 
Examples include murine experimental autoimmune encephalitis, systemic lupus 
5 erythmatosis in MRL/pr/pr mice or NZB hybrid mice, murine autoimmuno collagen 

arthritis, diabetes mellitus in OD mice and BB rats, and murine experimental myasthenia 
gravis (see Paul ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 840- 
856). 

[0375] Upregulation of an antigen function (preferably a B lymphocyte 

10 antigen function), as a means of up regulating immune responses, may also be useful in 
therapy. Upregulation of immune responses may be in the form of enhancing an existing 
immune response or eliciting an initial immune response. For example, enhancing an 
immune response through stimulating B lymphocyte antigen function may be useful in 
cases of viral infection. In addition, systemic viral diseases such as influenza, the common 
15 cold, and encephalitis might be alleviated by the administration of stimulatory form of B 

lymphocyte antigens systemically. 

[0376] Alternatively, anti-viral immune responses may be enhanced in an 

infected patient by removing T cells from the patient, costimulating the T cells in vitro 
with viral antigen-pulsed APCs either expressing a peptide of the present invention or 
20 together with a stimulatory form of a soluble peptide of the present invention and 

reintroducing the in vitro activated T cells into the patient. The infected cells would now 
be capable of delivering a costimulatory signal to T cells in vivo, thereby activating the T 
cells. 

[0377] In another application, up regulation or enhancement of antigen 

25 function (preferably B lymphocyte antigen function) may be useful in the induction of 

tumor immunity. Tumor cells (e.g., sarcoma, melanoma, lymphoma, leukemia, 
neuroblastoma, carcinoma) transfected with a nucleic acid encoding at least one peptide of 
the present invention can be administered to a subject to overcome tumor-specific 
tolerance in the subject. If desired, the tumor cell can be transfected to express a 
30 combination of peptides. For example, tumor cells obtained from a patient can be 
transfected ex vivo with an expression vector directing the expression of a peptide having 
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B7-2-like activity alone, or in conjunction with a peptide having B7-l-like activity and/or 
B7-3-like activity. The transfected tumor cells are returned to the patient to result in 
expression of the peptides on the surface of the transfected cell. Alternatively, gene 
therapy techniques can be used to target a tumor cell for transfection in vivo. 

5 [0378] The presence of the peptide of the present invention having the 

activity of a B lymphocyte antigen(s) on the surface of the tumor cell provides the 
necessary costimulation signal to T cells to induce a T cell mediated immune response 
against the transfected tumor cells. In addition, tumor cells which lack MHC class I or 
MHC class II molecules, or which fail to reexpress sufficient amounts of MHC class I or 

10 MHC class II molecules, can be transfected with nucleic acids encoding all or a portion of 
(e.g., a cytoplasmic-domain truncated portion) of an MHC class I a chain protein and p 2 
macroglobulin protein or an MHC class II a chain protein and an MHC class II |3 chain 
protein to thereby express MHC class I or MHC class II proteins on the cell surface. 
Expression of the appropriate class II or class II MHC in conjunction with a peptide having 

15 the activity of a B lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T cell mediated 
immune response against the transfected tumor cell. Optionally, a gene encoding an 
antisense construct which blocks expression of an MHC class II associated protein, such as 
the invariant chain, can also be cotransfected with a DNA encoding a peptide having the 
activity of a B lymphocyte antigen to promote presentation of tumor associated antigens 

20 and induce tumor specific immunity. Thus, the induction of a T cell mediated immune 

response in a human subject may be sufficient to overcome tumor-specific tolerance in the 
subject. Alternatively, as described in more detail below, genes encoding these proteins or 
nucleic acids regulating the expression of these proteins may be introduced into 
appropriate host cells to increase or decrease the expression of the proteins as desired. 

25 

EXAMPLE 34 

Assaying the Proteins Expressed from Exten ded cDNAs 

or Portions Thereof for Hemato poiesis Regulating Activity 

[0379] The proteins encoded by the extended cDNAs or portions thereof 

30 may also be evaluated for their hematopoiesis regulating activity. For example, the effect 
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of the proteins on embryonic stem cell differentiation may be evaluated. Numerous assays 
for such activity are familiar to those skilled in the art, including the assays disclosed in the 
following references, which are incorporated herein by reference: Johansson et ah Cellular 
Biology 15:141-151, 1995; Keller et al, Molecular and Cellular Biology 13:473-486, 
5 1993; McClanahan et al., Blood 81 :2903-2915, 1993. 

[0380] The proteins encoded by the extended cDNAs or portions thereof 

may also be evaluated for their influence on the lifetime of stem cells and stem cell 
differentiation. Numerous assays for such activity are familiar to those skilled in the art, 
including the assays disclosed in the following references, which are incorporated herein 

10 by reference: Freshney, M.G. Methylcellulose Colony Forming Assays, in Culture of 
Hematopoietic Cells. R.L Freshney, et al. Eds. pp. 265-268, Wiley-Liss, Inc., New York, 
NY. 1994; Hirayama et al, Proc. Natl. Acad. ScL USA 89:5907-5911, 1992; McNiece, 
LK. and Briddell, R.A. Primitive Hematopoietic Colony Forming Cells with High 
Proliferative Potential, in Culture of Hematopoietic Cells. R.L Freshney, et al. eds. Vol 

15 pp. 23-39, Wiley-Liss, Inc., New York, NY. 1994; Neben et al., Experimental 

Hematology 22:353-359, 1994; Ploemacher, R.E. Cobblestone Area Forming Cell Assay, 
In Culture of Hematopoietic Cells. R.I. Freshney, et al. Eds. pp. 1-21, Wiley-Liss, Inc., 
New York, NY. 1994; Spooncer, E., Dexter, M. and Allen, T. Long Term Bone Marrow 
Cultures in the Presence of Stromal Cells, in Culture of Hematopoietic Cells. R.I. 

20 Freshney, et al. Eds. pp. 163-179, Wiley-Liss, Inc., New York, NY. 1994; and Sutherland, 
H J. Long Term Culture Initiating Cell Assay, in Culture of Hematopoietic Cells. R.L 
Freshney, et al. Eds. pp. 139-162, Wiley-Liss, Inc., New York, NY. 1994. 

[0381] Those proteins which exhibit hematopoiesis regulatory activity may 

then be formulated as pharmaceuticals and used to treat clinical conditions in which 

25 regulation of hematopoeisis is beneficial For example, a protein of the present invention 

may be useful in regulation of hematopoiesis and, consequently, in the treatment of 
myeloid or lymphoid cell deficiencies. Even marginal biological activity in support of 
colony forming cells or of factor-dependent cell lines indicates involvement in regulating 
hematopoiesis, e.g. in supporting the growth and proliferation of erythroid progenitor cells 

30 alone or in combination with other cytokines, thereby indicating utility, for example, in 

treating various anemias or for use in conjunction with irradiation/chemotherapy to 
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stimulate the production of erythroid precursors and/or erythroid cells; in supporting the 
growth and proliferation of myeloid cells such as granulocytes and 
monocytes/macrophages (i.e., traditional CSF activity) useful, for example, in conjunction 
with chemotherapy to prevent or treat consequent myelo-suppression; in supporting the 
5 growth and proliferation of megakaryocytes and consequently of platelets thereby allowing 
prevention or treatment of various platelet disorders such as thrombocytopenia, and 
generally for use in place of or complimentary to platelet transfusions; and/or in supporting 
the growth and proliferation of hematopoietic stem cells which are capable of maturing to 
any and all of the above-mentioned hematopoietic cells and therefore find therapeutic 

10 utility in various stem cell disorders (such as those usually treated with transplantion, 
including, without limitation, aplastic anemia and paroxysmal nocturnal hemoglobinuria), 
as well as in repopulating the stem cell compartment post irradiation/chemotherapy, either 
in- vivo or ex-vivo (i.e., in conjunction with bone marrow transplantation or with peripheral 
progenitor cell transplantation (homologous or heterologous)) as normal cells or 

15 genetically manipulated for gene therapy. Alternatively, as described in more detail 
below, genes encoding these proteins or nucleic acids regulating the expression of these 
proteins may be introduced into appropriate host cells to increase or decrease the 
expression of the proteins as desired. 

20 EXAMPLE 35 

Assaying the Proteins Expressed from Extended cDNAs or Portion s Thereof for 

Regulation of Tissue Growth 

[0382] The proteins encoded by the extended cDNAs or portions thereof 

may also be evaluated for their effect on tissue growth. Numerous assays for such activity 
25 are familiar to those skilled in the art, including the assays disclosed in International Patent 
Publication No. WO95/16035, International Patent Publication No. WO95/05846 and 
International Patent Publication No. W09 1/07491, which are incorporated herein by 
reference. 

[0383] Assays for wound healing activity include, without limitation, those 

30 described in: Winter, Epidermal Wound Healing , pps. 71-112 (Maibach, HI and Rovee, 
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DT, eds.), Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and 
Mertz, J. Invest. Dermatol 71 :382-84 (1978) which are incorporated herein by reference. 

[0384] Those proteins which are involved in the regulation of tissue 

growth may then be formulated as pharmaceuticals and used to treat clinical conditions in 
5 which regulation of tissue growth is beneficial For example, a protein of the present 
invention also may have utility in compositions used for bone, cartilage, tendon, ligament 
and/or nerve tissue growth or regeneration, as well as for wound healing and tissue repair 
and replacement, and in the treatment of burns, incisions and ulcers. 

[0385] A protein of the present invention, which induces cartilage and/or 

10 bone growth in circumstances where bone is not normally formed, has application in the 
healing of bone fractures and cartilage damage or defects in humans and other animals. 
Such a preparation employing a protein of the invention may have prophylactic use in 
closed as well as open fracture reduction and also in the improved fixation of artificial 
joints. De novo bone formation induced by an osteogenic agent contributes to the repair of 
15 congenital, trauma induced, or oncologic resection induced craniofacial defects, and also is 
useful in cosmetic plastic surgery. 

[0386] A protein of this invention may also be used in the treatment of 

periodontal disease, and in other tooth repair processes. Such agents may provide an 
environment to attract bone-forming cells, stimulate growth of bone-forming cells or 
20 induce differentiation of progenitors of bone-forming cells. A protein of the invention may 
also be useful in the treatment of osteoporosis or osteoarthritis, such as through stimulation 
of bone and/or cartilage repair or by blocking inflammation or processes of tissue 
destruction (collagenase activity, osteoclast activity, etc.) mediated by inflammatory 
processes. 

25 [0387] Another category of tissue regeneration activity that may be 

attributable to the protein of the present invention is tendon/ligament formation. A protein 
of the present invention, which induces tendon/ligament-like tissue or other tissue 
formation in circumstances where such tissue is not normally formed, has application in 
the healing of tendon or ligament tears, deformities and other tendon or ligament defects in 

30 humans and other animals. Such a preparation employing a tendon/ligament-like tissue 
inducing protein may have prophylactic use in preventing damage to tendon or ligament 
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tissue, as well as use in the improved fixation of tendon or ligament to bone or other 
tissues, and in repairing defects to tendon or ligament tissue. De novo tendon/ligament- 
like tissue formation induced by a composition of the present invention contributes to the 
repair of congenital, trauma induced, or other tendon or ligament defects of other origin, 
5 and is also useful in cosmetic plastic surgery for attachment or repair of tendons or 
ligaments. The compositions of the present invention may provide an environment to 
attract tendon- or ligament-forming cells, stimulate growth of tendon- or ligament-forming 
cells, induce differentiation of progenitors of tendon- or ligament-forming cells, or induce 
growth of tendon/ligament cells or progenitors ex vivo for return in vivo to effect tissue 
10 repair. The compositions of the invention may also be useful in the treatment of tendinitis, 
carpal tunnel syndrome and other tendon or ligament defects. The compositions may also 
include an appropriate matrix and/or sequestering agent as a carrier as is well known in the 
art. 

[0388] The protein of the present invention may also be useful for 

15 proliferation of neural cells and for regeneration of nerve and brain tissue, i.e., for the 
treatment of central and peripheral nervous system diseases and neuropathies, as well as 
mechanical and traumatic disorders, which involve degeneration, death or trauma to neural 
cells or nerve tissue. More specifically, a protein may be used in the treatment of diseases 
of the peripheral nervous system, such as peripheral nerve injuries, peripheral neuropathy 
20 and localized neuropathies, and central nervous system diseases, such as Alzheimer's, 
Parkinson's disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager 
syndrome. Further conditions which may be treated in accordance with the present 
invention include mechanical and traumatic disorders, such as spinal cord disorders, head 
trauma and cerebrovascular diseases such as stroke. Peripheral neuropathies resulting 
25 from chemotherapy or other medical therapies may also be treatable using a protein of the 

invention. 

[0389] Proteins of the invention may also be useful to promote better or 

faster closure of non-healing wounds, including without limitation pressure ulcers, ulcers 
associated with vascular insufficiency, surgical and traumatic wounds, and the like. 

30 [0390] It is expected that a protein of the present invention may also 

exhibit activity for generation or regeneration of other tissues, such as organs (including, 
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for example, pancreas, liver, intestine, kidney, skin, endothelium) muscle (smooth, skeletal 
or cardiac) and vascular (including vascular endothelium) tissue, or for promoting the 
growth of cells comprising such tissues. Part of the desired effects may be by inhibition or 
modulation of fibrotic scarring to allow normal tissue to generate. A protein of the 
5 invention may also exhibit angiogenic activity. 

[0391] A protein of the present invention may also be useful for gut 

protection or regeneration and treatment of lung or liver fibrosis, reperfusion injury in 
various tissues, and conditions resulting from systemic cytokine damage. 

[0392] A protein of the present invention may also be useful for promoting 

10 or inhibiting differentiation of tissues described above from precursor tissues or cells; or 
for inhibiting the growth of tissues described above. 

[0393] Alternatively, as described in more detail below, genes encoding 

these proteins or nucleic acids regulating the expression of these proteins may be 
introduced into appropriate host cells to increase or decrease the expression of the proteins 
15 as desired. 

EXAMPLE 36 

Assaying the Proteins Expressed from Extended cDNAs or Portions 
Thereof for Regulation of Reproductive Hormones or Cell Movement 

20 [0394] The proteins encoded by the extended cDNAs or portions thereof 

may also be evaluated for their ability to regulate reproductive hormones, such as follicle 
stimulating hormone. Numerous assays for such activity are familiar to those skilled in the 
art, including the assays disclosed in the following references, which are incorporated 
herein by reference: Vale et al., Endocrinology 91:562-572, 1972; Ling et al, Nature 

25 321:779-782, 1986; Vale et al., Nature 321:776-779, 1986; Mason et al., Nature 318:659- 
663, 1985; Forage et al, Proc. NatL Acad. ScL USA 83:3091-3095, 1986. Chapter 6.12 
(Measurement of Alpha and Beta Chemokines) Current Protocols in Immunology, J.E. 
Coligan et al. Eds. Greene Publishing Associates and Wiley-Intersciece ; Taub et al. J. 
Clin. Invest 95:1370-1376, 1995; Lind et al. APMIS 103:140-146, 1995; Muiler et al. 
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Eur. J. Immunol. 25:1744-1748; Gruber et al. J- of Immunol 152:5860-5867, 1994; 
Johnston et al. J. of Immunol. 153:1762-1768, 1994. 

[0395] Those proteins which exhibit activity as reproductive hormones or 

regulators of cell movement may then be formulated as pharmaceuticals and used to treat 
5 clinical conditions in which regulation of reproductive hormones or cell movement are 
beneficial For example, a protein of the present invention may also exhibit activin- or 
inhibin-related activities. Inhibins are characterized by their ability to inhibit the release of 
follicle stimulating hormone (FSH), while activins are characterized by their ability to 
stimulate the release of folic stimulating hormone (FSH). Thus, a protein of the present 

10 invention, alone or in heterodimers with a member of the inhibin a family, may be useful 
as a contraceptive based on the ability of inhibins to decrease fertility in female mammals 
and decrease spermatogenesis in male mammals. Administration of sufficient amounts of 
other inhibins can induce infertility in these mammals. Alternatively, the protein of the 
invention, as a homodimer or as a heterodimer with other protein subunits of the inhibin-B 

15 group, may be useful as a fertility inducing therapeutic, based upon the ability of activin 

molecules in stimulating FSH release from cells of the anterior pituitary. See, for example, 
United States Patent 4,798,885, the disclosure of which is incorporated herein by 
reference. A protein of the invention may also be useful for advancement of the onset of 
fertility in sexually immature mammals, so as to increase the lifetime reproductive 

20 performance of domestic animals such as cows, sheep and pigs. 

[0396] Alternatively, as described in more detail below, genes encoding 

these proteins or nucleic acids regulating the expression of these proteins may be 
introduced into appropriate host cells to increase or decrease the expression of the proteins 
as desired. 

25 

EXAMPLE 36A 

Assaying the Proteins Expressed from Extended cDNAs or 

Portions Thereof for Chemotactic/Chemokinetic Activity 

[0397] The proteins encoded by the extended cDNAs or portions thereof 

30 may also be evaluated for chemotactic/chemokinetic activity. For example, a protein of 
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the present invention may have chemotactic or chemokinetic activity (e.g., act as a 
chemokine) for mammalian cells, including, for example, monocytes, fibroblasts, 
neutrophils, T-cells, mast cells, eosinophils, epithelial and/or endothelial cells. 
Chemotactic and chmokinetic proteins can be used to mobilize or attract a desired cell 
5 population to a desired site of action. Chemotactic or chemokinetic proteins provide 
particular advantages in treatment of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction of lymphocytes, monocytes or 
neutrophils to tumors or sites of infection may result in improved immune responses 
against the tumor or infecting agent. 

10 [0398] A protein or peptide has chemotactic activity for a particular cell 

population if it can stimulate, directly or indirectly, the directed orientation or movement 
of such cell population. Preferably, the protein or peptide has the ability to directly 
stimulate directed movement of cells. Whether a particular protein has chemotactic 
activity for a population of cells can be readily determined by employing such protein or 

15 peptide in any known assay for cell chemotaxis. 

[0399] The activity of a protein of the invention may, among other means, 

be measured by the following methods: 

[0400] Assays for chemotactic activity (which will identify proteins that 

induce or prevent chemotaxis) consist of assays that measure the ability of a protein to 

20 induce the migration of cells across a membrane as well as the ability of a protein to 
induce the adhension of one cell population to another cell population. Suitable assays for 
movement and adhesion include, without limitation, those described in: Current Protocols 
in Immunology, Ed by J.E. Coligan, A.M. Kruisbeek, D.H. Margulies, EJML Shevach, W. 
Strober, Pub. Greene Publishing Associates and Wiley-Interscience (Chapter 6.12, 

25 Measurement of alpha and beta Chemokincs 6.12.1-6.12.28; Taub et al. J. Clin. Invest. 
95:1370-1376, 1995; Lind et al. APMIS 103:140-146, 1995; Mueller et al Eur. J. 
Immunol. 25:17444748; Gruber et al. J. of Immunol. 152:5860-5867, 1994; Johnston et 
al. J. of Immunol, 153:1762-1768, 1994. 
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EXAMPLE 37 

Assaying the Proteins Expressed from Extended cDNAs or 

Portions Thereof for Regulation of Blood Clotting 

[0401] The proteins encoded by the extended cDNAs or portions thereof 

5 may also be evaluated for their effects on blood clotting. Numerous assays for such 
activity are familiar to those skilled in the art, including the assays disclosed in the 
following references, which are incorporated herein by reference: Linet et aL, J. Clin* 
Pharmacol. 26:131-140, 1986; Burdick et aL, Thrombosis Res. 45:413-419, 1987; 
Humphrey et aL, Fibrinolysis 5:71-79 (1991); Schaub, Prostaglandins 35:467-474, 1988. 

10 [0402] Those proteins which are involved in the regulation of blood 

clotting may then be formulated as pharmaceuticals and used to treat clinical conditions in 
which regulation of blood clotting is beneficial. For example, a protein of the invention 
may also exhibit hemostatic or thrombolytic activity. As a result, such a protein is 
expected to be useful in treatment of various coagulations disorders (including hereditary 

15 disorders, such as hemophilias) or to enhance coagulation and other hemostatic events in 

treating wounds resulting from trauma, surgery or other causes. A protein of the invention 
may also be useful for dissolving or inhibiting formation of thromboses and for treatment 
and prevention of conditions resulting therefrom (such as, for example, infarction of 
cardiac and central nervous system vessels (e.g., stroke). Alternatively, as described in 

20 more detail below, genes encoding these proteins or nucleic acids regulating the expression 
of these proteins may be introduced into appropriate host cells to increase or decrease the 
expression of the proteins as desired. 

EXAMPLE 38 

25 Assaying the Protein s Expres sed from Extended cDNAs or 

Portions Thereof for Involvement in Recep tor/Li gand Interactions 

[0403] The proteins encoded by the extended cDNAs or a portion thereof 

may also be evaluated for their involvement in receptor/ligand interactions. Numerous 
assays for such involvement are familiar to those skilled in the art, including the assays 
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disclosed in the following references, which are incorporated herein by reference: Chapter 
7.28 (Measurement of Cellular Adhesion under Static Conditions 7.28.1-7.28.22) in 
Current Protocols in Immunology, J.E. Coligan et al. Eds. Greene Publishing Associates 
and Wiley-Interscience; Takai et ah, Proc. Natl. Acad. ScL USA 84:6864-6868, 1987; 
5 Bierer et al., J. Exp. Med. 168:1 145-1 156, 1988; Rosenstein et al., J. Exp. Med. 169:149- 
160, 1989; Stoltenborg et al, J. Immunol. Methods 175:59-68, 1994; Stitt et al., Cell 
80:661-670, 1995; Gyuris et al, Cell 75:791-803, 1993. 

[0404] For example, the proteins of the present invention may also 

demonstrate activity as receptors, receptor ligands or inhibitors or agonists of 

10 receptor/ligand interactions. Examples of such receptors and ligands include, without 
limitation, cytokine receptors and their ligands, receptor kinases and their ligands, receptor 
phosphatases and their ligands, receptors involved in cell-cell interactions and their ligands 
(including without limitation, cellular adhesion molecules (such as selectins, integrins and 
their ligands) and receptor/ligand pairs involved in antigen presentation, antigen 

15 recognition and development of cellular and humoral immune respones). Receptors and 
ligands are also useful for screening of potential peptide or small molecule inhibitors of the 
relevant receptor/ligand interaction. A protein of the present invention (including, without 
limitation, fragments of receptors and ligands) may themselves be useful as inhibitors of 
receptor/ligand interactions. 

20 

EXAMPLE 38A 

Assaying the Proteins Ex pressed from Extended cDNAs or Portions 

Thereof for Anti-Inflammatory Activity 

[0405] The proteins encoded by the extended cDNAs or a portion thereof 

25 may also be evaluated for anti-inflammatory activity. The anti-inflammatory activity may 
be achieved by providing a stimulus to cells involved in the inflammatory response, by 
inhibiting or promoting cell-cell interactions (such as, for example, cell adhesion), by 
inhibiting or promoting chemotaxis of cells involved in the inflammatory process, 
inhibiting or promoting cell extravasation, or by stimulating or suppressing production of 
30 other factors which more directly inhibit or promote an inflammatory response. Proteins 
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exhibiting such activities can be used to treat inflammatory conditions including chronic or 
acute conditions), including without limitation inflammation associated with infection 
(such as septic shock, sepsis or systemic inflammatory response syndrome 
(SIRS))4schemia-reperfusioninury, endotoxin lethality, arthritis, complement-mediated 
5 hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, inflammatory 
bowel disease, Crohn's disease or resulting from over production of cytokines such as TNF 
or IL-L Proteins of the invention may also be useful to treat anaphylaxis and 
hypersensitivity to an antigenic substance or material. 

10 EXAMPLE 38B 

Assaying the Proteins Expressed from Extended cDNAs or 

Portions Thereof for Tumor Inhibition Activity 

[0406] The proteins encoded by the extended cDNAs or a portion thereof 

may also be evaluated for tumor inhibition activity. In addition to the activities described 

15 above for immunological treatment or prevention of tumors, a protein of the invention may 
exhibit other anti-tumor activities. A protein may inhibit tumor growth directly or 
indirectly (such as, for example, via ADCC). A protein may exhibit its tumor inhibitory 
activity by acting on tumor tissue or tumor precursor tissue, by inhibiting formation of 
tissues necessary to support tumor growth (such as, for example, by inhibiting 

20 angiogenesis), by causing production of other factors, agents or cell types which inhibit 
tumor growth, or by suppressing, eliminating or inhibiting factors, agents or cell types 
which promote tumor growth. 

[0407] A protein of the invention may also exhibit one or more of the 

following additional activities or effects: inhibiting the growth, infection or function of, or 

25 killing, infectious agents, including, without limitation, bacteria, viruses, fungi and other 
parasites; effecting (suppressing or enhancing) bodily characteristics, including, without 
limitation, height, weight, hair color, eye color, skin, fat to lean ratio or other tissue 
pigmentation, or organ or body part size or shape (such as, for example, breast 
augmentation or diminution, change in bone form or shape); effecting biorhythms or 

30 circadian cycles or rhythms; effecting the fertility of male or female subjects; effecting the 



-111- 



36.US3.DIV 

metabolism, catabolism, anabolism, processing, utilization, storage or elimination of 
dietary fat, lipid, protein, carbohydrate, vitamins, minerals, cofactors or other nutritional 
factors or component(s); effecting behavioral characteristics, including, without limitation, 
appetite, libido, stress, cognition (including cognitive disorders), depression (including 
5 depressive disorders) and violent behaviors; providing analgesic effects or other pain 
reducing effects; promoting differentiation and growth of embryonic stem cells in lineages 
other than hematopoietic lineages; hormonal or endocrine activity; in the case of enzymes, 
correcting deficiencies of the enzyme and treating deficiency-related diseases; treatment of 
hyperproliferative disorders (such as, for example, psoriasis); immunoglobulin-like activity 
10 (such as, for example, the ability to bind antigens or complement); and the ability to act as 
an antigen in a vaccine composition to raise an immune response against such protein or 
another material or entity which is cross-reactive with such protein. 

EXAMPLE 39 

15 Identification of Proteins which Interact with 

Polypeptides Encoded by Extended cDNAs 

[0408] Proteins which interact with the polypeptides encoded by extended 

cDNAs or portions thereof, such as receptor proteins, may be identified using two hybrid 
systems such as the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech). 

20 As described in the manual accompanying the Matchmaker Two Hybrid System 2 
(Catalog No. Kl 604-1, Clontech), which is incorporated herein by reference, the extended 
cDNAs or portions thereof, are inserted into an expression vector such that they are in 
frame with DNA encoding the DNA binding domain of the yeast transcriptional activator 
GALA cDNAs in a cDNA library which encode proteins which might interact with the 

25 polypeptides encoded by the extended cDNAs or portions thereof are inserted into a 
second expression vector such that they are in frame with DNA encoding the activation 
domain of GAL4. The two expression plasmids are transformed into yeast and the yeast 
are plated on selection medium which selects for expression of selectable markers on each 
of the expression vectors as well as GAL4 dependent expression of the HIS3 gene. 

30 Transformants capable of growing on medium lacking histidine are screened for GAL4 
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dependent lacZ expression. Those cells which are positive in both the histidine selection 
and the lacZ assay contain plasmids encoding proteins which interact with the polypeptide 
encoded by the extended cDNAs or portions thereof. 

[0409] Alternatively, the system described in Lustig et al., Methods in 

5 Enzyrnology 283: 83-99 (1997), the disclosure of which is incorporated herein by 
reference, may be used for identifying molecules which interact with the polypeptides 
encoded by extended cDNAs. In such systems, in vitro transcription reactions are 
performed on a pool of vectors containing extended cDNA inserts cloned downstream of a 
promoter which drives in vitro transcription. The resulting pools of mRNAs are 
10 introduced into Xenopus laevis oocytes. The oocytes are then assayed for a desired 
acitivity. 

[0410] Alternatively, the pooled in vitro transcription products produced as 

described above may be translated in vitro. The pooled in vitro translation products can be 
assayed for a desired activity or for interaction with a known polypeptide. 

15 [0411] Proteins or other molecules interacting with polypeptides 

encoded by extended cDNAs can be found by a variety of additional techniques. In one 
method, affinity columns containing the polypeptide encoded by the extended cDNA or 
a portion thereof can be constructed. In some versions, of this method the affinity 
column contains chimeric proteins in which the protein encoded by the extended cDNA 

20 or a portion thereof is fused to glutathione S-transferase. A mixture of cellular proteins 
or pool of expressed proteins as described above and is applied to the affinity column. 
Proteins interacting with the polypeptide attached to the column can then be isolated 
and analyzed on 2-D electrophoresis gel as described in Ramunsen et al. 
Electrophoresis, 18, 588-598 (1997), the disclosure of which is incorporated herein by 

25 reference. Alternatively, the proteins retained on the affinity column can be purified by 
electrophoresis based methods and sequenced. The same method can be used to isolate 
antibodies, to screen phage display products, or to screen phage display human 
antibodies. 

[0412] Proteins interacting with polypeptides encoded by extended 

30 cDNAs or portions thereof can also be screened by using an Optical Biosensor as 

described in Edwards & Leatherbarrow, Analytical Biochemistry, 246, 1-6 (1997), the 
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disclosure of which is incorporated herein by reference. The main advantage of the 
method is that it allows the determination of the association rate between the protein 
and other interacting molecules. Thus, it is possible to specifically select interacting 
molecules with a high or low association rate. Typically a target molecule is linked to 
5 the sensor surface (through a carboxymethl dextran matrix) and a sample of test 
molecules is placed in contact with the target molecules. The binding of a test molecule 
to the target molecule causes a change in the refractive index and/ or thickness. This 
change is detected by the Biosensor provided it occurs in the evanescent field (which 
extend a few hundred manometers from the sensor surface). In these screening assays, 
10 the target molecule can be one of the polypeptides encoded by extended cDNAs or a 
portion thereof and the test sample can be a collection of proteins extracted from tissues 
or cells, a pool of expressed proteins, combinatorial peptide and/ or chemical 
libraries,or phage displayed peptides. The tissues or cells from which the test proteins 
are extracted can originate from any species. 

15 [0413] In other methods, a target protein is immobilized and the test 

population is a collection of unique polypeptides encoded by the extended cDNAs or 
portions thereof. 

[0414] To study the interaction of the proteins encoded by the extended 

cDNAs or portions thereof with drugs, the microdialysis coupled to HPLC method 
20 described by Wang et al, Chromatographic 44, 205-208(1997) or the affinity capillary 
electrophoresis method described by Busch et al, J. Chromatogr. 777:311-328 (1997), 
the disclosures of which are incorporated herein by referenc can be used. 

[0415] The system described in U.S. Patent No. 5,654,150, the disclosure 

of which is incorporated herein by reference, may also be used to identify molecules which 
25 interact with the polypeptides encoded by the extended cDNAs. In this system, pools of 

extended cDNAs are transcribed and translated in vitro and the reaction products are 
assayed for interaction with a known polypeptide or antibody. 

[0416] It will be appreciated by those skilled in the art that the proteins 

expressed from the extended cDNAs or portions may be assayed for numerous activities in 
30 addition to those specifically enumerated above. For example, the expressed proteins may 

be evaluated for applications involving control and regulation of inflammation, tumor 
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proliferation or metastasis, infection, or other clinical conditions. Li addition, the proteins 
expressed from the extended cDNAs or portions thereof may be useful as nutritional 
agents or cosmetic agents. 

[0417] The proteins expressed from the extended cDNAs or portions 

thereof may be used to generate antibodies capable of specifically binding to the expressed 
protein or fragments thereof as described in Example 40 below. The antibodies may be 
capable of binding a full length protein encoded by one of the sequences of SEQ ID NOs: 
40-59, 61-73, 75, 77-82, and 130-154, a mature protein encoded by one of the sequences 
of SEQ ID NOs. 40-59, 61-75, 77-82, and 130-154, or a signal peptide encoded by one of 
the sequences of SEQ ID Nos. 40-59, 61-73, 75-82, 84 and 130-154. Alternatively, the 
antibodies may be capable of binding fragments of the proteins expressed from the 
extended cDNAs which comprise at least 10 amino acids of the sequences of SEQ ID 
NOs: 85-129 and 155-179. In some embodiments, the antibodies may be capable of 
binding fragments of the proteins expressed from the extended cDNAs which comprise at 
least 15 amino acids of the sequences of SEQ ID NOs: 85-129 and 155-179. In other 
embodiments, the antibodies may be capable of binding fragments of the proteins 
expressed from the extended cDNAs which comprise at least 25 amino acids of the 
sequences of SEQ ID NOs: 85-129 and 155-179. In further embodiments, the antibodies 
may be capable of binding fragments of the proteins expressed from the extended cDNAs 
which comprise at least 40 amino acids of the sequences of SEQ ID NOs: 85-129 and 155- 
179. 
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EXAMPLE 40 

Production of an Antibody to a Human Protein 

[0418] Substantially pure protein or polypeptide is isolated from the 

transfected or transformed cells as described in Example 30. The concentration of protein 
5 in the final preparation is adjusted, for example, by concentration on an Amicon filter 
device, to the level of a few micrograms/ml Monoclonal or polyclonal antibody to the 
protein can then be prepared as follows: 

A. Monoclonal Antibody Production by Hybridoma Fusion 

[0419] Monoclonal antibody to epitopes of any of the peptides identified 

10 and isolated as described can be prepared from murine hybridomas according to the 

classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or derivative 
methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the 
selected protein or peptides derived therefrom over a period of a few weeks. The mouse is 
then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells 
15 are fused by means of polyethylene glycol with mouse myeloma cells, and the excess 
unflised cells destroyed by growth of the system on selective media comprising 
aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the 
dilution placed in wells of a microtiter plate where growth of the culture is continued. 
Antibody-producing clones are identified by detection of antibody in the supernatant fluid 
20 of the wells by immunoassay procedures, such as Elisa, as originally described by Engvall, 
E., Meth. EnzymoL 70:419 (1980), and derivative methods thereof. Selected positive 
clones can be expanded and their monoclonal antibody product harvested for use. Detailed 
procedures for monoclonal antibody production are described in Davis, L. et al. Basic 
Methods in Molecular Biology Elsevier, New York. Section 21-2. 

25 

B. Polyclonal Antibody Production by Immunization 

[0420] Polyclonal antiserum containing antibodies to heterogeneous 

epitopes of a single protein can be prepared by immunizing suitable animals with the 
expressed protein or peptides derived therefrom described above, which can be unmodified 
30 or modified to enhance immunogenicity. Effective polyclonal antibody production is 
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affected by many factors related both to the antigen and the host species. For example, 
small molecules tend to be less immunogenic than others and may require the use of 
carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, 
with both inadequate or excessive doses of antigen resulting in low titer antisera. Small 
5 doses (ng level) of antigen administered at multiple intradermal sites appears to be most 
reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et 
al. J. Clin. Endocrinol. Metab. 33:988-991 (1971). 

[0421] Booster injections can be given at regular intervals, and antiserum 

harvested when antibody titer thereof, as determined semi-quantitatively, for example, by 

10 double immunodiffusion in agar against known concentrations of the antigen, begins to 

fall. See, for example, Ouchterlony, O. et al, Chap. 19 in: Handbook of Experimental 
Immunology D. Wier (ed) Blackwell (1973). Plateau concentration of antibody is usually 
in the range of 0.1 to 0.2 mg/ml of serum (about 12 \xM). Affinity of the antisera for the 
antigen is determined by preparing competitive binding curves, as described, for example, 

15 by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 2d Ed. (Rose and Friedman, 
Eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980). 

[0422] Antibody preparations prepared according to either protocol are 

useful in quantitative immunoassays which determine concentrations of antigen-bearing 
substances in biological samples; they are also used semi-quantitatively or qualitatively to 
20 identify the presence of antigen in a biological sample. The antibodies may also be used in 
therapeutic compositions for killing cells expressing the protein or reducing the levels of 
the protein in the body. 

V. Use of Extended cDNAs or Portions Thereof as Reagents 

[0423] The extended cDNAs of the present invention may be used as 

25 reagents in isolation procedures, diagnostic assays, and forensic procedures. For example, 

sequences from the extended cDNAs (or genomic DNAs obtainable therefrom) may be 
detectably labeled and used as probes to isolate other sequences capable of hybridizing to 
them. In addition, sequences from the extended cDNAs (or genomic DNAs obtainable 
therefrom) may be used to design PCR primers to be used in isolation, diagnostic, or 
30 forensic procedures. 
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EXAMPLE 41 

Preparation of PCR Primers and Amplification of DNA 

[0424] The extended cDNAs (or genomic DNAs obtainable therefrom) 

5 may be used to prepare PCR primers for a variety of applications, including isolation 
procedures for cloning nucleic acids capable of hybridizing to such sequences, diagnostic 
techniques and forensic techniques. The PCR primers are at least 10 bases, and preferably 
at least 12, 15, or 17 bases in length. More preferably, the PCR primers are at least 20-30 
bases in length. In some embodiments, the PCR primers may be more than 30 bases in 

10 length. It is preferred that the primer pairs have approximately the same G/C ratio, so that 

melting temperatures are approximately the same. A variety of PCR techniques are 
familiar to those skilled in the art. For a review of PCR technology, see Molecular 
Cloning to Genetic Engineering White, B.A. Ed. in Methods in Molecular Biology 67: 
Humana Press, Totowa 1997. In each of these PCR procedures, PCR primers on either 

15 side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic 

acid sample along with dNTPs and a thermostable polymerase such as Taq polymerase, 
Pfu polymerase, or Vent polymerase. The nucleic acid in the sample is denatured and the 
PCR primers are specifically hybridized to complementary nucleic acid sequences in the 
sample. The hybridized primers are extended. Thereafter, another cycle of denaturation, 

20 hybridization, and extension is initiated. The cycles are repeated multiple times to produce 
an amplified fragment containing the nucleic acid sequence between the primer sites. 

EXAMPLE 42 

Use of Extended cDNAs as Probes 

25 [0425] Probes derived from extended cDNAs or portions thereof (or 

genomic DNAs obtainable therefrom) may be labeled with detectable labels familiar to 
those skilled in the art, including radioisotopes and non-radioactive labels, to provide a 
detectable probe. The detectable probe may be single stranded or double stranded and 
may be made using techniques known in the art, including in vitro transcription, nick 

30 translation, or kinase reactions. A nucleic acid sample containing a sequence capable of 
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hybridizing to the labeled probe is contacted with the labeled probe. If the nucleic acid in 
the sample is double stranded, it may be denatured prior to contacting the probe. In some 
applications, the nucleic acid sample may be immobilized on a surface such as a 
nitrocellulose or nylon membrane. The nucleic acid sample may comprise nucleic acids 
5 obtained from a variety of sources, including genomic DNA, cDNA libraries, RNA, or 

tissue samples. 

[0426] Procedures used to detect the presence of nucleic acids capable of 

hybridizing to the detectable probe include well known techniques such as Southern 
blotting, Northern blotting, dot blotting, colony hybridization, and plaque hybridization. In 

10 some applications, the nucleic acid capable of hybridizing to the labeled probe may be 
cloned into vectors such as expression vectors, sequencing vectors, or in vitro transcription 
vectors to facilitate the characterization and expression of the hybridizing nucleic acids in 
the sample. For example, such techniques may be used to isolate and clone sequences in a 
genomic library or cDNA library which are capable of hybridizing to the detectable probe 

15 as described in Example 30 above. 

[0427] PCR primers made as described in Example 41 above may be used 

in forensic analyses, such as the DNA fingerprinting techniques described in Examples 43- 
47 below. Such analyses may utilize detectable probes or primers based on the sequences 
of the extended cDNAs isolated using the 5' ESTs (or genomic DNAs obtainable 
20 therefrom). 

EXAMPLE 43 

Forensic Matching by DNA Sequencing 

[0428] In one exemplary method, DNA samples are isolated from forensic 

25 specimens of, for example, hair, semen, blood or skin cells by conventional methods. A 

panel of PCR primers based on a number of the extended cDNAs (or genomic DNAs 
obtainable therefrom), is then utilized in accordance with Example 41 to amplify DNA of 
approximately 100-200 bases in length from the forensic specimen. Corresponding 
sequences are obtained from a test subject. Each of these identification DNAs is then 
30 sequenced using standard techniques, and a simple database comparison determines the 
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differences, if any, between the sequences from the subject and those from the sample. 
Statistically significant differences between the suspect's DNA sequences and those from 
the sample conclusively prove a lack of identity. This lack of identity can be proven, for 
example, with only one sequence. Identity, on the other hand, should be demonstrated 
5 with a large number of sequences, all matching. Preferably, a minimum of 50 statistically 
identical sequences of 100 bases in length are used to prove identity between the suspect 
and the sample. 

EXAMPLE 44 

10 Positive Identification by DNA Sequencing 

[0429] The technique outlined in the previous example may also be used 

on a larger scale to provide a unique fingerprint-type identification of any individual. In 
this technique, primers are prepared from a large number of sequences from Table IV and 
the appended sequence listing. Preferably, 20 to 50 different primers are used. These 

15 primers are used to obtain a corresponding number of PCR-generated DNA segments from 

the individual in question in accordance with Example 41 . Each of these DNA segments is 
sequenced, using the methods set forth in Example 43. The database of sequences 
generated through this procedure uniquely identifies the individual from whom the 
sequences were obtained. The same panel of primers may then be used at any later time to 

20 absolutely correlate tissue or other biological specimen with that individual. 

EXAMPLE 45 

Southern Blot Forensic Identification 

[0430] The procedure of Example 44 is repeated to obtain a panel of at 

25 least 10 amplified sequences from an individual and a specimen. Preferably, the panel 
contains at least 50 amplified sequences. More preferably, the panel contains 100 
amplified sequences. In some embodiments, the panel contains 200 amplified sequences. 
This PCR-generated DNA is then digested with one or a combination of, preferably, four 
base specific restriction enzymes. Such enzymes are commercially available and known to 
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those of skill in the art. After digestion, the resultant gene fragments are size separated in 
multiple duplicate wells on an agarose gel and transferred to nitrocellulose using Southern 
blotting techniques well known to those with skill in the art. For a review of Southern 
blotting see Davis et al (Basic Methods in Molecular Biology , 1986, Elsevier Press, pp 
5 62-65). 

[0431] A panel of probes based on the sequences of the extended cDNAs 

(or genomic DNAs obtainable therefrom), or fragments thereof of at least 10 bases, are 
radioactively or colorimetrically labeled using methods known in the art, such as nick 
translation or end labeling, and hybridized to the Southern blot using techniques known in 

10 the art (Davis et al., supra) . Preferably, the probe comprises at least 12, 15, or 17 
consecutive nucleotides from the extended cDNA (or genomic DNAs obtainable 
therefrom). More preferably, the probe comprises at least 20-30 consecutive nucleotides 
from the extended cDNA (or genomic DNAs obtainable therefrom). In some 
embodiments, the probe comprises more than 30 nucleotides from the extended cDNA (or 

15 genomic DNAs obtainable therefrom). In other embodiments, the probe comprises at least 

40, at least 50, at least 75, at least 100, at least 150, or at least 200 consecutive nucleotides 
from the extended cDNA (or genomic DNAs obtainable therefrom). 

[0432] Preferably, at least 5 to 10 of these labeled probes are used, and 

more preferably at least about 20 or 30 are used to provide a unique pattern. The resultant 

20 bands appearing from the hybridization of a large sample of extended cDNAs (or genomic 
DNAs obtainable therefrom) will be a unique identifier. Since the restriction enzyme 
cleavage will be different for every individual, the band pattern on the Southern blot will 
also be unique. Increasing the number of extended cDNA probes will provide a 
statistically higher level of confidence in the identification since there will be an increased 

25 number of sets of bands used for identification. 
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EXAMPLE 46 

Dot Blot Identification Procedure 

[0433] Another technique for identifying individuals using the extended 

cDNA sequences disclosed herein utilizes a dot blot hybridization technique. 

5 [0434] Genomic DNA is isolated from nuclei of subject to be identified. 

Oligonucleotide probes of approximately 30 bp in length are synthesized that correspond 
to at least 10, preferably 50 sequences from the extended cDNAs or genomic DNAs 
obtainable therefrom. The probes are used to hybridize to the genomic DNA through 
conditions known to those in the art. The oligonucleotides are end labeled with P 32 using 

10 polynucleotide kinase (Pharmacia). Dot Blots are created by spotting the genomic DNA 
onto nitrocellulose or the like using a vacuum dot blot manifold (BioRad, Richmond 
California). The nitrocellulose filter containing the genomic sequences is baked or UV 
linked to the filter, prehybridized and hybridized with labeled probe using techniques 
known in the art (Davis et al. supra ). The 32 P labeled DNA fragments are sequentially 

15 hybridized with successively stringent conditions to detect minimal differences between 

the 30 bp sequence and the DNA. Tetramethylammonium chloride is useful for 
identifying clones containing small numbers of nucleotide mismatches (Wood et al., Proc. 
Natl. Acad. Sci. USA 82(6):1 585-1 588 (1985)) which is hereby incorporated by reference. 
A unique pattern of dots distinguishes one individual from another individual. 

20 [0435] Extended cDNAs or oligonucleotides containing at least 10 

consecutive bases from these sequences can be used as probes in the following alternative 
fingerprinting technique. Preferably, the probe comprises at least 12, 15, or 17 consecutive 
nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom). More 
preferably, the probe comprises at least 20-30 consecutive nucleotides from the extended 

25 cDNA (or genomic DNAs obtainable therefrom). In some embodiments, the probe 

comprises more than 30 nucleotides from the extended cDNA (or genomic DNAs 
obtainable therefrom). In other embodiments, the probe comprises at least 40, at least 50, 
at least 75, at least 100, at least 150, or at least 200 consecutive nucleotides from the 
extended cDNA (or genomic DNAs obtainable therefrom). 
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[0436] Preferably, a plurality of probes having sequences from different 

genes are used in the alternative fingerprinting technique. Example 47 below provides a 
representative alternative fingerprinting procedure in which the probes are derived from 
extended cDNAs. 

5 

EXAMPLE 47 

Alternative "Fingerprint" Identification Technique 

[0437] 20-mer oligonucleotides are prepared from a large number, e.g. 50, 

100, or 200, of extended cDNA sequences (or genomic DNAs obtainable therefrom) using 

10 commercially available oligonucleotide services such as Genset, Paris, France. Cell 
samples from the test subject are processed for DNA using techniques well known to those 
with skill in the art. The nucleic acid is digested with restriction enzymes such as EcoRI 
and XbaL Following digestion, samples are applied to wells for electrophoresis. The 
procedure, as known in the art, may be modified to accommodate polyacrylamide 

15 electrophoresis, however in this example, samples containing 5 ug of DNA are loaded into 

wells and separated on 0.8% agarose gels. The gels are transferred onto nitrocellulose 
using standard Southern blotting techniques. 

[0438] 10 ng of each of the oligonucleotides are pooled and end-labeled 

with P 32 . The nitrocellulose is prehybridized with blocking solution and hybridized with 

20 the labeled probes. Following hybridization and washing, the nitrocellulose filter is 
exposed to X-Omat AR X-ray film. The resulting hybridization pattern will be unique for 
each individual. 

[0439] It is additionally contemplated within this example that the number 

of probe sequences used can be varied for additional accuracy or clarity. 

25 [0440] The antibodies generated in Examples 30 and 40 above may be 

used to identify the tissue type or cell species from which a sample is derived as described 
above. 
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EXAMPLE 48 

Identification of Tissue Types or Cell Species by Means of 

Labeled Tissue Specific Antibodies 

[0441] Identification of specific tissues is accomplished by the 

5 visualization of tissue specific antigens by means of antibody preparations according to 

Examples 30 and 40 which are conjugated, directly or indirectly to a detectable marker. 
Selected labeled antibody species bind to their specific antigen binding partner in tissue 
sections, cell suspensions, or in extracts of soluble proteins from a tissue sample to provide 
a pattern for qualitative or semi-qualitative interpretation. 

10 [0442] Antisera for these procedures must have a potency exceeding that 

of the native preparation, and for that reason, antibodies are concentrated to a mg/ml level 
by isolation of the gamma globulin fraction, for example, by ion-exchange 
chromatography or by ammonium sulfate fractionation. Also, to provide the most specific 
antisera, unwanted antibodies, for example to common proteins, must be removed from the 

15 gamma globulin fraction, for example by means of insoluble immunoabsorbents, before 

the antibodies are labeled with the marker. Either monoclonal or heterologous antisera is 
suitable for either procedure. 

A. Immunohistochemical Techniques 

[0443] Purified, high-titer antibodies, prepared as described above, are 

20 conjugated to a detectable marker, as described, for example, by Fudenberg, H., Chap. 26 

in: Basic 503 Clinical Immunology, 3rd Ed. Lange, Los Altos, California (1980) or Rose, 
N. et al, Chap. 12 in: Methods in Immunodiagnosis, 2d Ed. John Wiley 503 Sons, New 
York (1980). 

[0444] A fluorescent marker, either fluorescein or rhodamine, is preferred, 

25 but antibodies can also be labeled with an enzyme that supports a color producing reaction 
with a substrate, such as horseradish peroxidase. Markers can be added to tissue-bound 
antibody in a second step, as described below. Alternatively, the specific antitissue 
antibodies can be labeled with ferritin or other electron dense particles, and localization of 
the ferritin coupled antigen-antibody complexes achieved by means of an electron 
30 microscope. In yet another approach, the antibodies are radiolabeled, with, for example 
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125 I, and detected by overlaying the antibody treated preparation with photographic 
emulsion. 

[0445] Preparations to carry out the procedures can comprise monoclonal 

or polyclonal antibodies to a single protein or peptide identified as specific to a tissue type, 
for example, brain tissue, or antibody preparations to several antigenically distinct tissue 
specific antigens can be used in panels, independently or in mixtures, as required. 

[0446] Tissue sections and cell suspensions are prepared for 

immunohistochemical examination according to common histological techniques. 
Multiple cryostat sections (about 4 um, unfixed) of the unknown tissue and known control, 
are mounted and each slide covered with different dilutions of the antibody preparation. 
Sections of known and unknown tissues should also be treated with preparations to 
provide a positive control, a negative control, for example, pre-immune sera, and a control 
for non-specific staining, for example, buffer. 

[0447] Treated sections are incubated in a humid chamber for 30 min at 

room temperature, rinsed, then washed in buffer for 30-45 min. Excess fluid is blotted 
away, and the marker developed. 

[0448] If the tissue specific antibody was not labeled in the first 

incubation, it can be labeled at this time in a second antibody-antibody reaction, for 
example, by adding fluorescein- or enzyme-conjugated antibody against the 
immunoglobulin class of the antiserum-producing species, for example, fluorescein 
labeled antibody to mouse IgG. Such labeled sera are commercially available. 

[0449] The antigen found in the tissues by the above procedure can be 

quantified by measuring the intensity of color or fluorescence on the tissue section, and 
calibrating that signal using appropriate standards. 

B. Identification of Tissue Specific Soluble Proteins 

[0450] The visualization of tissue specific proteins and identification of 

unknown tissues from that procedure is carried out using the labeled antibody reagents and 
detection strategy as described for immunohistochemistry; however the sample is prepared 
according to an electrophoretic technique to distribute the proteins extracted from the 
tissue in an orderly array on the basis of molecular weight for detection. 
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[0451] A tissue sample is homogenized using a Virtis apparatus; cell 

suspensions are disrupted by Dounce homogenization or osmotic lysis, using detergents in 
either case as required to disrupt cell membranes, as is the practice in the art. Insoluble 
cell components such as nuclei, microsomes, and membrane fragments are removed by 
ultxacentrifugation, and the soluble protein-containing fraction concentrated if necessary 
and reserved for analysis. 

[0452] A sample of the soluble protein solution is resolved into individual 

protein species by conventional SDS polyacrylamide electrophoresis as described, for 
example, by Davis, L. et al, Section 19-2 in: Basic Methods in Molecular Biology (P. 
Leder, ed), Elsevier, New York (1986), using a range of amounts of polyacrylamide in a 
set of gels to resolve the entire molecular weight range of proteins to be detected in the 
sample. A size marker is run in parallel for purposes of estimating molecular weights of 
the constituent proteins. Sample size for analysis is a convenient volume of from 5 to55 
(al, and containing from about 1 to 100 |ag protein. An aliquot of each of the resolved 
proteins is transferred by blotting to a nitrocellulose filter paper, a process that maintains 
the pattern of resolution. Multiple copies are prepared. The procedure, known as Western 
Blot Analysis, is well described in Davis, L. et al., (above) Section 19-3. One set of 
nitrocellulose blots is stained with Coomassie Blue dye to visualize the entire set of 
proteins for comparison with the antibody bound proteins. The remaining nitrocellulose 
filters are then incubated with a solution of one or more specific antisera to tissue specific 
proteins prepared as described in Examples 30 and 40. In this procedure, as in procedure A 
above, appropriate positive and negative sample and reagent controls are run. 

[0453] In either procedure A or B, a detectable label can be attached to the 

primary tissue antigen-primary antibody complex according to various strategies and 
permutations thereof. In a straightforward approach, the primary specific antibody can be 
labeled; alternatively, the unlabeled complex can be bound by a labeled secondary anti- 
IgG antibody. In other approaches, either the primary or secondary antibody is conjugated 
to a biotin molecule, which can, in a subsequent step, bind an avidin conjugated marker. 
According to yet another strategy, enzyme labeled or radioactive protein A, which has the 
property of binding to any IgG, is bound in a final step to either the primary or secondary 
antibody. 
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[0454] The visualization of tissue specific antigen binding at levels above 

those seen in control tissues to one or more tissue specific antibodies, prepared from the 
gene sequences identified from extended cDNA sequences, can identify tissues of 
unknown origin, for example, forensic samples, or differentiated tumor tissue that has 
metastasized to foreign bodily sites. 

[0455] In addition to their applications in forensics and identification, 

extended cDNAs (or genomic DNAs obtainable therefrom) may be mapped to their 
chromosomal locations. Example 49 below describes radiation hybrid (RH) mapping of 
human chromosomal regions using extended cDNAs. Example 50 below describes a 
representative procedure for mapping an extended cDNA (or a genomic DNA obtainable 
therefrom) to its location on a human chromosome. Example 5 1 below describes mapping 
of extended cDNAs (or genomic DNAs obtainable therefrom) on metaphase chromosomes 
by Fluorescence In Situ Hybridization (FISH). 
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EXAMPLE 49 

Radiation hybrid mapping of Extended cDNAs to the human genome 
[0456] Radiation hybrid (RH) mapping is a somatic cell genetic approach 

that can be used for high resolution mapping of the human genome. In this approach, cell 
lines containing one or more human chromosomes are lethally irradiated, breaking each 
chromosome into fragments whose size depends on the radiation dose. These fragments 
are rescued by fusion with cultured rodent cells, yielding subclones containing different 
portions of the human genome. This technique is described by Benham et al. {Genomics 
4:509-517, 1989) and Cox et al., {Science 250:245-250, 1990), the entire contents of which 
are hereby incorporated by reference. The random and independent nature of the 
subclones permits efficient mapping of any human genome marker. Human DNA isolated 
from a panel of 80-100 cell lines provides a mapping reagent for ordering extended 
cDNAs (or genomic DNAs obtainable therefrom). In this approach, the frequency of 
breakage between markers is used to measure distance, allowing construction of fine 
resolution maps as has been done using conventional ESTs (Schuler et al., Science 
274:540-546, 1996, hereby incorporated by reference). 

[0457] RH mapping has been used to generate a high-resolution whole 

genome radiation hybrid map of human chromosome 17q22-q25.3 across the genes for 
growth hormone (GH) and thymidine kinase (TK) (Foster et al., Genomics 33:185-192, 
1996), the region surrounding the Gorlin syndrome gene (Obermayr et al., Eur. J. Hum. 
Genet. 4:242-245, 1996), 60 loci covering the entire short arm of chromosome 12 
(Raeymaekers et al., Genomics 29:170-178, 1995), the region of human chromosome 22 
containing the neurofibromatosis type 2 locus (Frazer et al., Genomics 14:574-584, 1992) 
and 13 loci on the long arm of chromosome 5 (Warrington et al., Genomics 11:701-708, 
1991). 
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EXAMPLE 50 

Mapping of Extended cDNAs to Human 

Chromosomes using PCR techniques 

[0458] Extended cDNAs (or genomic DNAs obtainable therefrom) may be 

5 assigned to human chromosomes using PCR based methodologies. In such approaches, 
oligonucleotide primer pairs are designed from the extended cDNA sequence (or the 
sequence of a genomic DNA obtainable therefrom) to minimize the chance of amplifying 
through an intron. Preferably, the oligonucleotide primers are 1 8-23 bp in length and are 
designed for PCR amplification. The creation of PCR primers from known sequences is 
10 well known to those with skill in the art. For a review of PCR technology see Erlich, H.A. ? 
PCR Technology; Principles and Applications for DNA Amplification , 1992. W.H. 
Freeman and Co., New York. 

[0459] The primers are used in polymerase chain reactions (PCR) to 

amplify templates from total human genomic DNA. PCR conditions are as follows: 60 ng 

15 of genomic DNA is used as a template for PCR with 80 ng of each oligonucleotide primer, 
0.6 unit of Taq polymerase, and 1 ^iCu of a 32 P-labeled deoxycytidine triphosphate. The 
PCR is performed in a microplate theimocycler (Techne) under the following conditions: 
30 cycles of 94°C ? 1.4 min; 55°C, 2 min; and 72°Q 2 min; with a final extension at 72°C 
for 10 min. The amplified products are analyzed on a 6% polyacrylamide sequencing gel 

20 and visualized by autoradiography. If the length of the resulting PCR product is identical 
to the distance between the ends of the primer sequences in the extended cDNA from 
which the primers are derived, then the PCR reaction is repeated with DNA templates 
from two panels of human-rodent somatic cell hybrids, BIOS PCRable DNA (BIOS 
Corporation) and NIGMS Human-Rodent Somatic Cell Hybrid Mapping Panel Number 1 

25 (NIGMS, Camden, NJ). 

[0460] PCR is used to screen a series of somatic cell hybrid cell lines 

containing defined sets of human chromosomes for the presence of a given extended 
cDNA (or genomic DNA obtainable therefrom). DNA is isolated from the somatic 
hybrids and used as starting templates for PCR reactions using the primer pairs from the 
30 extended cDNAs (or genomic DNAs obtainable therefrom). Only those somatic cell 
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hybrids with chromosomes containing the human gene corresponding to the extended 
cDNA (or genomic DNA obtainable therefrom) will yield an amplified fragment. The 
extended cDNAs (or genomic DNAs obtainable therefrom) are assigned to a chromosome 
by analysis of the segregation pattern of PCR products from the somatic hybrid DNA 
templates. The single human chromosome present in all cell hybrids that give rise to an 
amplified fragment is the chromosome containing that extended cDNA (or genomic DNA 
obtainable therefrom). For a review of techniques and analysis of results from somatic cell 
gene mapping experiments. (See Ledbetter et al., Genomics 6:475-481 (1990).) 

[0461] Alternatively, the extended cDNAs (or genomic DNAs obtainable 

therefrom) may be mapped to individual chromosomes using FISH as described in 
Example 51 below. 



EXAMPLE 51 

Mapping of Extended 5' ESTs to Chromosomes 

Using Fluorescence in situ Hybridization 

[0462] Fluorescence in situ hybridization allows the extended cDNA (or 

genomic DNA obtainable therefrom) to be mapped to a particular location on a given 
chromosome. The chromosomes to be used for fluorescence in situ hybridization 
techniques may be obtained from a variety of sources including cell cultures, tissues, or 
whole blood. 

[0463] In a preferred embodiment, chromosomal localization of an 

extended cDNA (or genomic DNA obtainable therefrom) is obtained by FISH as described 
by Cherif et al. (Proc. Natl. Acad. Sci. U.S.A., 87:6639-6643, 1990). Metaphase 
chromosomes are prepared from phytohemagglutinin (PHA)-stimulated blood cell donors. 
PHA-stimulated lymphocytes from healthy males are cultured for 72 h in RPMI-1640 
medium. For synchronization, methotrexate (10 uM) is added for 17 h, followed by 
addition of 5-bromodeoxyuridine (5-BudR, 0.1 mM) for 6 h. Colcemid (1 [ig/ml) is added 
for the last 15 min before harvesting the cells. Cells are collected, washed in RPMI, 
incubated with a hypotonic solution of KC1 (75 mM) at 37°C for 15 min and fixed in three 
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changes of methanol:acetic acid (3:1). The cell suspension is dropped onto a glass slide 
and air dried. The extended cDNA (or genomic DNA obtainable therefrom) is labeled 
with biotin-16 dUTP by nick translation according to the manufacturer's instructions 
(Bethesda Research Laboratories, Bethesda, MD), purified using a Sephadex G-50 column 
5 (Pharmacia, Upssala, Sweden) and precipitated. Just prior to hybridization, the DNA 
pellet is dissolved in hybridization buffer (50% formamide, 2 X SSC, 10% dextran sulfate, 
1 mg/ml sonicated salmon sperm DNA, pH 7) and the probe is denatured at 70°C for 5-10 
min. 

[0464] Slides kept at -20°C are treated for 1 h at 37°C with RNase A (100 

10 J-ig/ml), rinsed three times in 2 X SSC and dehydrated in an ethanol series. Chromosome 
preparations are denatured in 70% formamide, 2 X SSC for 2 min at 70°C, then 
dehydrated at 4°C. The slides are treated with proteinase K (10 jag/100 ml in 20 mM Tris- 
HC1, 2 mM CaCl 2 ) at 37°C for 8 min and dehydrated. The hybridization mixture 
containing the probe is placed on the slide, covered with a coverslip, sealed with rubber 
15 cement and incubated overnight in a humid chamber at 37°C. After hybridization and 

post-hybridization washes, the biotinylated probe is detected by avidin-FITC and amplified 
with additional layers of biotinylated goat anti-avidin and avidin-FITC. For chromosomal 
localization, fluorescent R-bands are obtained as previously described (Cherif et al., 
supra.). The slides are observed under a LEICA fluorescence microscope (DMRXA). 
20 Chromosomes are counterstained with propidium iodide and the fluorescent signal of the 
probe appears as two symmetrical yellow-green spots on both chromatids of the 
fluorescent R-band chromosome (red). Thus, a particular extended cDNA (or genomic 
DNA obtainable therefrom) may be localized to a particular cytogenetic R-band on a given 
chromosome. 

25 [0465] Once the extended cDNAs (or genomic DNAs obtainable 

therefrom) have been assigned to particular chromosomes using the techniques described 
in Examples 49-51 above, they may be utilized to construct a high resolution map of the 
chromosomes on which they are located or to identify the chromosomes in a sample. 
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EXAMPLE 52 

Use of Extended cDNAs to Construct or Expand Chromosome Maps 

[0466] Chromosome mapping involves assigning a given unique sequence 

to a particular chromosome as described above. Once the unique sequence has been 
5 mapped to a given chromosome, it is ordered relative to other unique sequences located on 
the same chromosome. One approach to chromosome mapping utilizes a series of yeast 
artificial chromosomes (YACs) bearing several thousand long inserts derived from the 
chromosomes of the organism from which the extended cDNAs (or genomic DNAs 
obtainable therefrom) are obtained. This approach is described in Ramaiah Nagaraja et al. 

10 Genome Research 7:210-222, March 1997. Briefly, in this approach each chromosome is 

broken into overlapping pieces which are inserted into the YAC vector. The YAC inserts 
are screened using PCR or other methods to determine whether they include the extended 
cDNA (or genomic DNA obtainable therefrom) whose position is to be determined. Once 
an insert has been found which includes the extended cDNA (or genomic DNA obtainable 

15 therefrom), the insert can be analyzed by PCR or other methods to determine whether the 

insert also contains other sequences known to be on the chromosome or in the region from 
which the extended cDNA (or genomic DNA obtainable therefrom) was derived. This 
process can be repeated for each insert in the YAC library to determine the location of 
each of the extended cDNAs (or genomic DNAs obtainable therefrom) relative to one 

20 another and to other known chromosomal markers. In this way, a high resolution map of 

the distribution of numerous unique markers along each of the organisms chromosomes 
may be obtained. 

[0467] As described in Example 53 below extended cDNAs (or genomic 

DNAs obtainable therefrom) may also be used to identify genes associated with a 
25 particular phenotype, such as hereditary disease or drug response. 



EXAMPLE 53 

Identification of genes associated with hereditary diseases or drug response 

[0468] This example illustrates an approach useful for the association of 

30 extended cDNAs (or genomic DNAs obtainable therefrom) with particular phenotypic 
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characteristics. In this example, a particular extended cDNA (or genomic DNA obtainable 
therefrom) is used as a test probe to associate that extended cDNA (or genomic DNA 
obtainable therefrom) with a particular phenotypic characteristic. 

[0469] Extended cDNAs (or genomic DNAs obtainable therefrom) are 

5 mapped to a particular location on a human chromosome using techniques such as those 

described in Examples 49 and 50 or other techniques known in the art. A search of 
Mendelian Inheritance in Man (V. McKusick, Mendelian Inheritance in Man (available 
on line through Johns Hopkins University Welch Medical Library) reveals the region of 
the human chromosome which contains the extended cDNA (or genomic DNA obtainable 
10 therefrom) to be a very gene rich region containing several known genes and several 
diseases or phenotypes for which genes have not been identified. The gene corresponding 
to this extended cDNA (or genomic DNA obtainable therefrom) thus becomes an 
immediate candidate for each of these genetic diseases. 

[0470] Cells from patients with these diseases or phenotypes are isolated 

15 and expanded in culture. PCR primers from the extended cDNA (or genomic DNA 
obtainable therefrom) are used to screen genomic DNA, mRNA or cDNA obtained from 
the patients. Extended cDNAs (or genomic DNAs obtainable therefrom) that are not 
amplified in the patients can be positively associated with a particular disease by further 
analysis. Alternatively, the PCR analysis may yield fragments of different lengths when 
20 the samples are derived from an individual having the phenotype associated with the 
disease than when the sample is derived from a healthy individual, indicating that the gene 
containing the extended cDNA maybe responsible for the genetic disease. 

VI. Use of Extended cDNAs (or genomic DNAs obtainable therefrom) to Construct 
Vectors 

25 [0471] The present extended cDNAs (or genomic DNAs obtainable 

therefrom) may also be used to construct secretion vectors capable of directing the 
secretion of the proteins encoded by genes inserted in the vectors. Such secretion vectors 
may facilitate the purification or enrichment of the proteins encoded by genes inserted 
therein by reducing the number of background proteins from which the desired protein 

30 must be purified or enriched. Exemplary secretion vectors are described in Example 54 
below. 
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EXAMPLE 54 

Construction of Secretion Vectors 

[0472] The secretion vectors of the present invention include a promoter 

5 capable of directing gene expression in the host cell, tissue, or organism of interest. Such 

promoters include the Rous Sarcoma Virus promoter, the SV40 promoter, the human 
cytomegalovirus promoter, and other promoters familiar to those skilled in the art. 

[0473] A signal sequence from an extended cDNA (or genomic DNA 

obtainable therefrom), such as one of the signal sequences in SEQ ID NOs; 40-59, 61-73, 

10 75-82, 84, and 130-154 as defined in Table IV above, is operably linked to the promoter 
such that the mRNA transcribed from the promoter will direct the translation of the signal 
peptide. The host cell, tissue, or organism may be any cell, tissue, or organism which 
recognizes the signal peptide encoded by the signal sequence in the extended cDNA (or 
genomic DNA obtainable therefrom). Suitable hosts include mammalian cells, tissues or 

15 organisms, avian cells, tissues, or organisms, insect cells, tissues or organisms, or yeast. 

[0474] In addition, the secretion vector contains cloning sites for inserting 

genes encoding the proteins which are to be secreted. The cloning sites facilitate the 
cloning of the insert gene in frame with the signal sequence such that a fusion protein in 
which the signal peptide is fused to the protein encoded by the inserted gene is expressed 
20 from the mRNA transcribed from the promoter. The signal peptide directs the 

extracellular secretion of the fusion protein. 

[0475] The secretion vector may be DNA or RNA and may integrate into 

the chromosome of the host, be stably maintained as an extrachromosomal replicon in the 
host, be an artificial chromosome, or be transiently present in the host. Many nucleic acid 
25 backbones suitable for use as secretion vectors are known to those skilled in the art, 

including retroviral vectors, SV40 vectors, Bovine Papilloma Virus vectors, yeast 
integrating plasmids, yeast episomal plasmids, yeast artificial chromosomes, human 
artificial chromosomes, P element vectors, baculovirus vectors, or bacterial plasmids 
capable of being transiently introduced into the host. 
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[0476] The secretion vector may also contain a polyA signal such that the 

polyA signal is located downstream of the gene inserted into the secretion vector 

[0477] After the gene encoding the protein for which secretion is desired is 

inserted into the secretion vector, the secretion vector is introduced into the host cell, 
tissue, or organism using calcium phosphate precipitation, DEAE-Dextran, 
electroporation, liposome-mediated transfection, viral particles or as naked DNA. The 
protein encoded by the inserted gene is then purified or enriched from the supernatant 
using conventional techniques such as ammonium sulfate precipitation, 
immunoprecipitation, immunochromatography, size exclusion chromatography, ion 
exchange chromatography, and hplc. Alternatively, the secreted protein may be in a 
sufficiently enriched or pure state in the supernatant or growth media of the host to permit 
it to be used for its intended purpose without further enrichment. 

[0478] The signal sequences may also be inserted into vectors designed for 

gene therapy. In such vectors, the signal sequence is operably linked to a promoter such 
that mKNA transcribed from the promoter encodes the signal peptide. A cloning site is 
located downstream of the signal sequence such that a gene encoding a protein whose 
secretion is desired may readily be inserted into the vector and fused to the signal 
sequence. The vector is introduced into an appropriate host cell. The protein expressed 
from the promoter is secreted extracellularly, thereby producing a therapeutic effect. 

[0479] The extended cDNAs or 5' ESTs may also be used to clone 

sequences located upstream of the extended cDNAs or 5' ESTs which are capable of 
regulating gene expression, including promoter sequences, enhancer sequences, and other 
upstream sequences which influence transcription or translation levels. Once identified 
and cloned, these upstream regulatory sequences may be used in expression vectors 
designed to direct the expression of an inserted gene in a desired spatial, temporal, 
developmental, or quantitative fashion. Example 55 describes a method for cloning 
sequences upstream of the extended cDNAs or 5 r ESTs. 
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EXAMPLE 55 

Use of Extended cDNAs or 5' ESTs to Clone Upstream 

Se quences from Genomic DNA 

[0480] Sequences derived from extended cDNAs or 5 r ESTs may be used 

5 to isolate the promoters of the corresponding genes using chromosome walking 

techniques. In one chromosome walking technique, which utilizes the Genome Walker™ 
kit available from Clontech, five complete genomic DNA samples are each digested with a 
different restriction enzyme which has a 6 base recognition site and leaves a blunt end. 
Following digestion, oligonucleotide adapters are ligated to each end of the resulting 
1 0 genomic DNA fragments. 

[0481 J For each of the five genomic DNA libraries, a first PCR reaction is 

performed according to the manufacturer's instructions (which are incorporated herein by 
reference) using an outer adaptor primer provided in the kit and an outer gene specific 
primer. The gene specific primer should be selected to be specific for the extended cDNA 

15 or 5' EST of interest and should have a melting temperature, length, and location in the 
extended cDNA or ' EST which is consistent with its use in PCR reactions. Each first PCR 
reaction contains 5ng of genomic DNA, 5 \xl of 10X Tth reaction buffer, 0.2 mM of each 
dNTP, 0.2 (iM each of outer adaptor primer and outer gene specific primer, 1.1 mM of 
Mg(OAc) 2 , and 1 ^il of the Tth polymerase SOX mix in a total volume of 50 \il The 

20 reaction cycle for the first PCR reaction is as follows: 1 min @ 94°C / 2 sec @ 94°C, 3 
min @ 72°C (7 cycles) / 2 sec @ 94°C, 3 min @ 67°C (32 cycles) / 5 min @ 67°C 

[0482] The product of the first PCR reaction is diluted and used as a 

template for a second PCR reaction according to the manufacturer's instructions using a 
pair of nested primers which are located internally on the amplicon resulting from the first 

25 PCR reaction. For example, 5 (il of the reaction product of the first PCR reaction mixture 
may be diluted 180 times. Reactions are made in a 50 \il volume having a composition 
identical to that of the first PCR reaction except the nested primers are used. The first 
nested primer is specific for the adaptor, and is provided with the GenomeWalker™ kit. 
The second nested primer is specific for the particular extended cDNA or 5' EST for which 

30 the promoter is to be cloned and should have a melting temperature, length, and location in 
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the extended cDNA or 5' EST which is consistent with its use in PCR reactions. The 
reaction parameters of the second PCR reaction are as follows: 1 min @ 94°C / 2 sec @ 
94°C, 3 min @ 72°C (6 cycles) / 2 sec @ 94°C, 3 min @ 67°C (25 cycles) / 5 min @ 
67°C 

5 [0483J The product of the second PCR reaction is purified, cloned, and 

sequenced using standard techniques. Alternatively, two or more human genomic DNA 
libraries can be constructed by using two or more restriction enzymes. The digested 
genomic DNA is cloned into vectors which can be converted into single stranded, circular, 
or linear DNA. A biotinylated oligonucleotide comprising at least 15 nucleotides from the 

10 extended cDNA or 5' EST sequence is hybridized to the single stranded DNA. Hybrids 

between the biotinylated oligonucleotide and the single stranded DNA containing the 
extended cDNA or EST sequence are isolated as described in Example 29 above. 
Thereafter, the single stranded DNA containing the extended cDNA or EST sequence is 
released from the beads and converted into double stranded DNA using a primer specific 

15 for the extended cDNA or 5' EST sequence or a primer corresponding to a sequence 
included in the cloning vector. The resulting double stranded DNA is transformed into 
bacteria. DNAs containing the 5' EST or extended cDNA sequences are identified by 
colony PCR or colony hybridization. 

[0484] Once the upstream genomic sequences have been cloned and 

20 sequenced as described above, prospective promoters and transcription start sites within 
the upstream sequences may be identified by comparing the sequences upstream of the 
extended cDNAs or 5' ESTs with databases containing known transcription start sites, 
transcription factor binding sites, or promoter sequences. 

[0485] In addition, promoters in the upstream sequences may be identified 

25 using promoter reporter vectors as described in Example 56. 
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EXAMPLE 56 

Identification of Promoters in Cloned Upstream Sequences 

[0486] The genomic sequences upstream of the extended cDNAs or 5' 

ESTs are cloned into a suitable promoter reporter vector, such as the pSEAP-Basic, 

5 pSEAP-Enhancer, pPgal-Basic, ppgal-Enhancer, or pEGFP-1 Promoter Reporter vectors 
available from Clontech. Briefly, each of these promoter reporter vectors include multiple 
cloning sites positioned upstream of a reporter gene encoding a readily assayable protein 
such as secreted alkaline phosphatase, p galactosidase, or green fluorescent protein. The 
sequences upstream of the extended cDNAs or 5' ESTs are inserted into the cloning sites 

10 upstream of the reporter gene in both orientations and introduced into an appropriate host 

cell The level of reporter protein is assayed and compared to the level obtained from a 
vector which lacks an insert in the cloning site. The presence of an elevated expression 
level in the vector containing the insert with respect to the control vector indicates the 
presence of a promoter in the insert. If necessary, the upstream sequences can be cloned 

15 into vectors which contain an enhancer for augmenting transcription levels from weak 

promoter sequences. A significant level of expression above that observed with the vector 
lacking an insert indicates that a promoter sequence is present in the inserted upstream 
sequence. 

[0487] Appropriate host cells for the promoter reporter vectors may be 

20 chosen based on the results of the above described determination of expression patterns of 

the extended cDNAs and ESTs. For example, if the expression pattern analysis indicates 
that the mRNA corresponding to a particular extended cDNA or 5' EST is expressed in 
fibroblasts, the promoter reporter vector may be introduced into a human fibroblast cell 
line. 

25 [0488] Promoter sequences within the upstream genomic DNA may be 

further defined by constructing nested deletions in the upstream DNA using conventional 
techniques such as Exonuclease III digestion. The resulting deletion fragments can be 
inserted into the promoter reporter vector to determine whether the deletion has reduced or 
obliterated promoter activity. In this way, the boundaries of the promoters may be defined. 

30 If desired, potential individual regulatory sites within the promoter may be identified using 
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site directed mutagenesis or linker scanning to obliterate potential transcription factor 
binding sites within the promoter individually or in combination. The effects of these 
mutations on transcription levels may be determined by inserting the mutations into the 
cloning sites in the promoter reporter vectors. 

5 

EXAMPLE 57 

Cloning and Identification of Promoters 

[0489] Using the method described in Example 55 above with 5' ESTs, 

sequences upstream of several genes were obtained. Using the primer pairs GGG AAG 
10 ATG GAG ATA GTA TTG CCT G (SEQ ID NO:29) and CTG CCA TGT ACA TGA 
TAG AGA GAT TC (SEQ ID NO:30), the promoter having the internal designation 
P13H2 (SEQ ID NO:31) was obtained. 

[0490J Using the primer pairs GTA CCA GGGG ACT GTG ACC ATT 

GC (SEQ ID NO:32) and CTG TGA CCA TTG CTC CCA AGA GAG (SEQ ID NO:33), 
15 the promoter having the internal designation PI 5B4 (SEQ ID NO:34) was obtained. 

[0491] Using the primer pairs CTG GGA TGG AAG GCA CGG TA 

(SEQ ID NO:35) and GAG ACC ACA CAG CTA GAC AA (SEQ ID NO:36), the 
promoter having the internal designation P29B6 (SEQ ID NO:37) was obtained. 

[0492] Figure 8 provides a schematic description of the promoters isolated 

20 and the way they are assembled with the corresponding 5' tags. The upstream sequences 

were screened for the presence of motifs resembling transcription factor binding sites or 
known transcription start sites using the computer program Matlnspector release 2.0, 
August 1996. 

[0493] Figure 9 describes the transcription factor binding sites present in 

25 each of these promoters. The columns labeled matrice provides the name of the 
Matlnspector matrix used. The column labeled position provides the 5' position of the 
promoter site. Numeration of the sequence starts from the transcription site as determined 
by matching the genomic sequence with the 5' EST sequence. The column labeled 
"orientation" indicates the DNA strand on which the site is found, with the + strand being 
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the coding strand as determined by matching the genomic sequence with the sequence of 
the 5' EST. The column labeled "score" provides the Matlnspector score found for this 
site. The column labeled "length" provides the length of the site in nucleotides. The 
column labeled "sequence" provides the sequence of the site found. 

5 [0494] The promoters and other regulatory sequences located upstream of 

the extended cDNAs or 5' ESTs may be used to design expression vectors capable of 
directing the expression of an inserted gene in a desired spatial, temporal, developmental, 
or quantitative manner. A promoter capable of directing the desired spatial, temporal, 
developmental, and quantitative patterns may be selected using the results of the 
10 expression analysis described in Example 26 above. For example, if a promoter which 

confers a high level of expression in muscle is desired, the promoter sequence upstream of 
an extended cDNA or 5' EST derived from an mRNA which is expressed at a high level in 
muscle, as determined by the method of Example 26, may be used in the expression 
vector. 

15 [0495] Preferably, the desired promoter is placed near multiple restriction 

sites to facilitate the cloning of the desired insert downstream of the promoter, such that 
the promoter is able to drive expression of the inserted gene. The promoter may be 
inserted in conventional nucleic acid backbones designed for extrachromosomal 
replication, integration into the host chromosomes or transient expression. Suitable 

20 backbones for the present expression vectors include retroviral backbones, backbones from 
eukaryotic episomes such as SV40 or Bovine Papilloma Virus, backbones from bacterial 
episomes, or artificial chromosomes. 

[0496] Preferably, the expression vectors also include a polyA signal 

downstream of the multiple restriction sites for directing the polyadenylation of mRNA 
25 transcribed from the gene inserted into the expression vector. 

[0497] Following the identification of promoter sequences using the 

procedures of Examples 55-57, proteins which interact with the promoter may be 
identified as described in Example 58 below. 
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EXAMPLE 58 

Identification of Proteins Which Interact with Promoter Sequences, Upstream 

Regulatory Sequences, or mRNA 

[0498] Sequences within the promoter region which are likely to bind 

5 transcription factors may be identified by homology to known transcription factor binding 
sites or through conventional mutagenesis or deletion analyses of reporter plasmids 
containing the promoter sequence. For example, deletions may be made in a reporter 
plasmid containing the promoter sequence of interest operably linked to an assayable 
reporter gene. The reporter plasmids carrying various deletions within the promoter region 

10 are transfected into an appropriate host cell and the effects of the deletions on expression 
levels is assessed. Transcription factor binding sites within the regions in which deletions 
reduce expression levels may be further localized using site directed mutagenesis, linker 
scanning analysis, or other techniques familiar to those skilled in the art. Nucleic acids 
encoding proteins which interact with sequences in the promoter may be identified using 

15 one-hybrid systems such as those described in the manual accompanying the Matchmaker 

One-Hybrid System kit available from Clontech (Catalog No. Kl 603-1), the disclosure of 
which is incorporated herein by reference. Briefly, the Matchmaker One-hybrid system is 
used as follows. The target sequence for which it is desired to identify binding proteins is 
cloned upstream of a selectable reporter gene and integrated into the yeast genome. 

20 Preferably, multiple copies of the target sequences are inserted into the reporter plasmid in 
tandem. 

[0499] A library comprised of fusions between cDNAs to be evaluated for 

the ability to bind to the promoter and the activation domain of a yeast transcription factor, 
such as GAL4, is transformed into the yeast strain containing the integrated reporter 

25 sequence. The yeast are plated on selective media to select cells expressing the selectable 

marker linked to the promoter sequence. The colonies which grow on the selective media 
contain genes encoding proteins which bind the target sequence. The inserts in the genes 
encoding the fusion proteins are further characterized by sequencing. In addition, the 
inserts may be inserted into expression vectors or in vitro transcription vectors. Binding of 

30 the polypeptides encoded by the inserts to the promoter DNA may be confirmed by 
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techniques familiar to those skilled in the art, such as gel shift analysis or DNAse 
protection analysis. 

VII. Use of Extended cDNAs (or Genomic DNAs Obtainable Therefrom) in Gene 
5 Therapy 

[0500J The present invention also comprises the use of extended cDNAs 

(or genomic DNAs obtainable therefrom) in gene therapy strategies, including antisense 
and triple helix strategies as described in Examples 57 and 58 below. In antisense 
approaches, nucleic acid sequences complementary to an mRNA are hybridized to the 

10 mRNA intracellular^, thereby blocking the expression of the protein encoded by the 

mRNA. The antisense sequences may prevent gene expression through a variety of 
mechanisms. For example, the antisense sequences may inhibit the ability of ribosomes to 
translate the mRNA. Alternatively, the antisense sequences may block transport of the 
mRNA from the nucleus to the cytoplasm, thereby limiting the amount of mRNA available 

15 for translation. Another mechanism through which antisense sequences may inhibit gene 
expression is by interfering with mRNA splicing. In yet another strategy, the antisense 
nucleic acid may be incorporated in a ribozyme capable of specifically cleaving the target 
mRNA. 
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EXAMPLE 59 

Preparation and Use of Antisense Oligonucleotides 

[0501] The antisense nucleic acid molecules to be used in gene therapy 

may be either DNA or RNA sequences. They may comprise a sequence complementary to 

5 the sequence of the extended cDNA (or genomic DNA obtainable therefrom). The 
antisense nucleic acids should have a length and melting temperature sufficient to permit 
formation of an intracellular duplex having sufficient stability to inhibit the expression of 
the mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use 
in gene therapy are disclosed in Green et al., Ann. Rev. Biochem. 55:569-597 (1986) and 

10 Izant and Weintraub, Cell 36:1007-1015 (1984), which are hereby incorporated by 

reference. 

[0502] In some strategies, antisense molecules are obtained from a 

nucleotide sequence encoding a protein by reversing the orientation of the coding region 
with respect to a promoter so as to transcribe the opposite strand from that which is 
15 normally transcribed in the cell The antisense molecules may be transcribed using in vitro 
transcription systems such as those which employ T7 or SP6 polymerase to generate the 
transcript. Another approach involves transcription of the antisense nucleic acids in vivo 
by operably linking DNA containing the antisense sequence to a promoter in an expression 
vector. 

20 [0503] Alternatively, oligonucleotides which are complementary to the 

strand normally transcribed in the cell may be synthesized in vitro. Thus, the antisense 
nucleic acids are complementary to the corresponding mRNA and are capable of 
hybridizing to the mRNA to create a duplex. In some embodiments, the antisense 
sequences may contain modified sugar phosphate backbones to increase stability and make 

25 them less sensitive to RNase activity. Examples of modifications suitable for use in 

antisense strategies are described by Rossi et al, Pharmacol. Ther, 50(2):245-254, 
(1991). 

[0504] Various types of antisense oligonucleotides complementary to the 

sequence of the extended cDNA (or genomic DNA obtainable therefrom) may be used. In 
30 one preferred embodiment, stable and semi-stable antisense oligonucleotides described in 
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International Application No. PCT WO94/23026, hereby incorporated by reference, are 
used. In these molecules, the 3 f end or both the 3' and 5' ends are engaged in 
intramolecular hydrogen bonding between complementary base pairs. These molecules 
are better able to withstand exonuclease attacks and exhibit increased stability compared to 
5 conventional antisense oligonucleotides. 

[0505] In another preferred embodiment, the antisense 

oligodeoxynucleotides against herpes simplex virus types 1 and 2 described in 
International Application No. WO 95/04141, hereby incorporated by reference, are used. 

[0506] In yet another preferred embodiment, the covalently cross-linked 

10 antisense oligonucleotides described in International Application No. WO 96/31523, 
hereby incorporated by reference, are used. These double- or single-stranded 
oligonucleotides comprise one or more, respectively, inter- or intra-oligonucleotide 
covalent cross-linkages, wherein the linkage consists of an amide bond between a primary 
amine group of one strand and a carboxyl group of the other strand or of the same strand, 
15 respectively, the primary amine group being directly substituted in the 2' position of the 
strand nucleotide monosaccharide ring, and the carboxyl group being carried by an 
aliphatic spacer group substituted on a nucleotide or nucleotide analog of the other strand 
or the same strand, respectively. 

[0507] The antisense oligodeoxynucleotides and oligonucleotides 

20 disclosed in International Application No. WO 92/18522, incorporated by reference, may 
also be used. These molecules are stable to degradation and contain at least one 
transcription control recognition sequence which binds to control proteins and are effective 
as decoys therefor. These molecules may contain "hairpin" structures, "dumbbell" 
structures, "modified dumbbell" structures, "cross-linked" decoy structures and "loop" 
25 structures. 

[0508] In another preferred embodiment, the cyclic double-stranded 

oligonucleotides described in European Patent Application No. 0 572 287 A2, hereby 
incorporated by reference are used. These ligated oligonucleotide "dumbbells" contain the 
binding site for a transcription factor and inhibit expression of the gene under control of 
30 the transcription factor by sequestering the factor. 
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[0509] Use of the closed antisense oligonucleotides disclosed in 

International Application No. WO 92/19732, hereby incorporated by reference is also 
contemplated. Because these molecules have no free ends, they are more resistant to 
degradation by exonucleases than are conventional oligonucleotides. These 
5 oligonucleotides may be multifunctional, interacting with several regions which are not 
adjacent to the target mRNA. 

[0510] The appropriate level of antisense nucleic acids required to inhibit 

gene expression may be determined using in vitro expression analysis. The antisense 
molecule may be introduced into the cells by diffusion, injection, infection or transfection 

10 using procedures known in the art. For example, the antisense nucleic acids can be 
introduced into the body as a bare or naked oligonucleotide, oligonucleotide encapsulated 
in lipid, oligonucleotide sequence encapsulated by viral protein, or as an oligonucleotide 
operably linked to a promoter contained in an expression vector. The expression vector 
may be any of a variety of expression vectors known in the art, including retroviral or viral 

15 vectors, vectors capable of extrachromosomal replication, or integrating vectors. The 
vectors may be DNA or RNA. 

[0511] The antisense molecules are introduced onto cell samples at a 

number of different concentrations preferably between lxl(T l0 M to lxl0 4 M. Once the 
minimum concentration that can adequately control gene expression is identified, the 

20 optimized dose is translated into a dosage suitable for use in vivo. For example, an 
inhibiting concentration in culture of IxlO" 7 translates into a dose of approximately 0.6 
mg/kg bodyweight. Levels of oligonucleotide approaching 100 mg/kg bodyweight or 
higher may be possible after testing the toxicity of the oligonucleotide in laboratory 
animals. It is additionally contemplated that cells from the vertebrate are removed, treated 

25 with the antisense oligonucleotide, and reintroduced into the vertebrate. 

[0512] It is further contemplated that the antisense oligonucleotide 

sequence is incorporated into a ribozyme sequence to enable the antisense to specifically 
bind and cleave its target mRNA. For technical applications of ribozyme and antisense 
oligonucleotides see Rossi et ah, supra. 

30 [0513] In a preferred application of this invention, the polypeptide encoded 

by the gene is first identified, so that the effectiveness of antisense inhibition on translation 
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can be monitored using techniques that include but are not limited to antibody-mediated 
tests such as RIAs and ELISA, functional assays, or radiolabeling. 

[0514] The extended cDNAs of the present invention (or genomic DNAs 

obtainable therefrom) may also be used in gene therapy approaches based on intracellular 

5 triple helix formation. Triple helix oligonucleotides are used to inhibit transcription from a 
genome. They are particularly useful for studying alterations in cell activity as it is 
associated with a particular gene. The extended cDNAs (or genomic DNAs obtainable 
therefrom) of the present invention or, more preferably, a portion of those sequences, can 
be used to inhibit gene expression in individuals having diseases associated with 

10 expression of a particular gene. Similarly, a portion of the extended cDNA (or genomic 
DNA obtainable therefrom) can be used to study the effect of inhibiting transcription of a 
particular gene within a cell. Traditionally, homopurine sequences were considered the 
most useful for triple helix strategies. However, homopyrimidine sequences can also 
inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major groove 

15 at homopurine:homopyrimidine sequences. Thus, both types of sequences from the 

extended cDNA or from the gene corresponding to the extended cDNA are contemplated 
within the scope of this invention. 
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EXAMPLE 60 

Preparation and use of Triple Helix Probes 

[0515] The sequences of the extended cDNAs (or genomic DNAs 

obtainable therefrom) are scanned to identify 10-mer to 20-mer homopyrimidine or 
5 homopurine stretches which could be used in triple-helix based strategies for inhibiting 

gene expression. Following identification of candidate homopyrimidine or homopurine 
stretches, their efficiency in inhibiting gene expression is assessed by introducing varying 
amounts of oligonucleotides containing the candidate sequences into tissue culture cells 
which normally express the target gene. The oligonucleotides may be prepared on an 
10 oligonucleotide synthesizer or they may be purchased commercially from a company 
specializing in custom oligonucleotide synthesis, such as GENSET, Paris, France. 

[0516] The oligonucleotides may be introduced into the cells using a 

variety of methods known to those skilled in the art, including but not limited to calcium 
phosphate precipitation, DEAE-Dextran, electroporation, liposome-mediated transfection 
15 or native uptake, 

[0517] Treated cells are monitored for altered cell function or reduced 

gene expression using techniques such as Northern blotting, RNase protection assays, or 
PCR based strategies to monitor the transcription levels of the target gene in cells which 
have been treated with the oligonucleotide . The cell functions to be monitored are 

20 predicted based upon the homologies of the target gene corresponding to the extended 
cDNA from which the oligonucleotide was derived with known gene sequences that have 
been associated with a particular function. The cell functions can also be predicted based 
on the presence of abnormal physiologies within cells derived from individuals with a 
particular inherited disease, particularly when the extended cDNA is associated with the 

25 disease using techniques described in Example 53. 

[0518] The oligonucleotides which are effective in inhibiting gene 

expression in tissue culture cells may then be introduced in vivo using the techniques 
described above and in Example 59 at a dosage calculated based on the in vitro results, as 
described in Example 59. 
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[0519] In some embodiments, the natural (beta) anomers of the 

oligonucleotide units can be replaced with alpha anomers to render the oligonucleotide 
more resistant to nucleases. Further, an intercalating agent such as ethidium bromide, or 
the like, can be attached to the 3' end of the alpha oligonucleotide to stabilize the triple 
5 helix. For information on the generation of oligonucleotides suitable for triple helix 
formation see Griffin et al. (Science 245:967-971 (1989), which is hereby incorporated by 
this reference). 

EXAMPLE 61 

10 Use of Extended cDNAs to Express an Encoded Protein in a Host Organism 

[0520] The extended cDNAs of the present invention may also be used to 

express an encoded protein in a host organism to produce a beneficial effect. In such 
procedures, the encoded protein may be transiently expressed in the host organism or 
stably expressed in the host organism. The encoded protein may have any of the activities 
1 5 described above. The encoded protein may be a protein which the host organism lacks or, 
alternatively, the encoded protein may augment the existing levels of the protein in the 
host organism. 

[0521] A full length extended cDNA encoding the signal peptide and the 

mature protein, or an extended cDNA encoding only the mature protein is introduced into 
20 the host organism. The extended cDNA may be introduced into the host organism using a 
variety of techniques known to those of skill in the art. For example, the extended cDNA 
may be injected into the host organism as naked DNA such that the encoded protein is 
expressed in the host organism, thereby producing a beneficial effect. 

[0522] Alternatively, the extended cDNA may be cloned into an 

25 expression vector downstream of a promoter which is active in the host organism. The 
expression vector may be any of the expression vectors designed for use in gene therapy, 
including viral or retroviral vectors. 

[0523] The expression vector may be directly introduced into the host 

organism such that the encoded protein is expressed in the host organism to produce a 
30 beneficial effect. In another approach, the expression vector may be introduced into cells 
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in vitro. Cells containing the expression vector are thereafter selected and introduced into 
the host organism, where they express the encoded protein to produce a beneficial effect. 

EXAMPLE 62 

5 Use Of Signal Peptides Encoded By 5' Ests Or Sequences 

Obtained Therefrom To Import Proteins Into Cells 

[0524] The short core hydrophobic region (h) of signal peptides encoded 

by the 5'ESTS or extended cDNAs derived from the 5'ESTs of the present invention may 
also be used as a carrier to import a peptide or a protein of interest, so-called cargo, into 
10 tissue culture cells (Lin et al, 1 Biol Chem., 270: 14225-14258 (1995); Du et al, I 

Peptide Res., 51: 235-243 (1998); Rojas etal, Nature Biotech., 16: 370-375 (1998)). 

[0525] When cell permeable peptides of limited size (approximately up to 

25 amino acids) are to be translocated across cell membrane, chemical synthesis may be 
used in order to add the h region to either the C-terminus or the N-tenninus to the cargo 

15 peptide of interest. Alternatively, when longer peptides or proteins are to be imported into 
cells, nucleic acids can be genetically engineered, using techniques familiar to those skilled 
in the art, in order to link the extended cDNA sequence encoding the h region to the 5' or 
the 3' end of a DNA sequence coding for a cargo polypeptide. Such genetically 
engineered nucleic acids are then translated either in vitro or in vivo after transfection into 

20 appropriate cells, using conventional techniques to produce the resulting cell permeable 
polypeptide. Suitable hosts cells are then simply incubated with the cell permeable 
polypeptide which is then translocated across the membrane. 

[0526] This method may be applied to study diverse intracellular functions 

and cellular processes. For instance, it has been used to probe functionally relevant 
25 domains of intracellular proteins and to examine protein-protein interactions involved in 
signal transduction pathways (Lin et al } supra; Lin et al, J. Biol Chem., 271: 5305-5308 
(1996); Rojas et al, J. Biol Chem., 271: 27456-27461 (1996); Liu et al f Proc. Natl Acad. 
Sci USA, 93: 1 1819-1 1824 (1996); Rojas et al, Bioch. Biophys. Res. Commun., 234: 675- 
680(1997)). 
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[0527] Such techniques may be used in cellular therapy to import proteins 

producing therapeutic effects. For instance, cells isolated from a patient may be treated 
with imported therapeutic proteins and then re-introduced into the host organism. 

[0528] Alternatively, the h region of signal peptides of the present 

5 invention could be used in combination with a nuclear localization signal to deliver nucleic 

acids into cell nucleus. Such oligonucleotides may be antisense oligonucleotides or 
oligonucleotides designed to form triple helixes, as described in examples 59 and 60 
respectively, in order to inhibit processing and maturation of a target cellular RNA. 

10 EXAMPLE 63 

Reassembling & Resequencing of Clones 

[0529] Full length cDNA clones obtained by the procedure described in 

Example 27 were double-sequenced. These sequences were assembled and the resulting 
consensus sequences were then reanalyzed. Open reading frames were reassigned 
15 following essentially the same process as the one described in Example 27. 

[0530] After this reanalysis process a few abnormalities were revealed. 

The sequence presented in SEQ ED NO: 84 is apparently unlikely to be genuine full 
length cDNAs. This clone is more probably a 3' truncated cDNA sequence based on 
homology studies with existing protein sequences. Similarly, the sequences presented 
20 in SEQ ID NOs: 60, 76, 83 and 84 may also not be genuine full length cDNAs based on 
homology studies with existing protein sequences. Although these sequences encode a 
potential start methionine, except for SEQ ID NO:60, they could represent a 5 r truncated 
cDNA. 

[0531] Finally, after the reassignment of open reading frames for the 

25 clones, new open reading frames were chosen in some instances. For example, in the 
case of SEQ ID NOs: 60, 74 and 83 the new open reading frames were no longer 
predicted to contain a signal peptide. 

[0532] As discussed above, Table IV provides the sequence 

identification numbers of the extended cDNAs of the present invention, the locations of 
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the full coding sequences in SEQ ID NOs: 40-84 and 130-154 (i.e. the nucleotides 
encoding both the signal peptide and the mature protein, listed under the heading FCS 
location in Table IV), the locations of the nucleotides in SEQ ID NOs: 40-84 and 130- 
154 which encode the signal peptides (listed under the heading SigPep Location in 
5 Table IV), the locations of the nucleotides in SEQ ID NOs: 40-84 and 130-154 which 
encode the mature proteins generated by cleavage of the signal peptides (listed under 
the heading Mature Polypeptide Location in Table IV), the locations in SEQ ID NOs: 
40-84 and 130-154 of stop codons (listed under the heading Stop Codon Location in 
Table IV) the locations in SEQ ID NOs: 40-84 and 130-154 of polyA signals (listed 
10 under the heading g PolyA Signal Location in Table IV) and the locations of polyA 
sites (listed under the heading PolyA Site Location in Table IV). 

[0533J As discussed above, Table V lists the sequence identification 

numbers of the polypeptides of SEQ ID NOs: 85-129 and 155-179, the locations of the 
amino acid residues of SEQ ID NOs: 85-129 and 155-179 in the full length polypeptide 

15 (second column), the locations of the amino acid residues of SEQ ID NOs: 85-129 and 
155-179 in the signal peptides (third column), and the locations of the amino acid 
residues of SEQ ID NOs: 85-129 and 155-179 in the mature polypeptide created by 
cleaving the signal peptide from the full length polypeptide (fourth column). In Table 
V, and in the appended sequence listing, the first amino acid of the mature protein 

20 resulting from cleavage of the signal peptide is designated as amino acid number 1 and 
the first amino acid of the signal peptide is designated with the appropriate negative 
number, in accordance with the regulations governing sequence listings. 

Example 64 

25 Functional Analysis of Predicted Protein Sequences 

[0534] Following double-sequencing, new contigs were assembled for 

each of the extended cDNAs of the present invention and each was compared to known 
sequences available at the time of filing. These sequences originate from the following 
databases : Genbank (release 108 and daily releases up to October, 15, 1998), Genseq 
30 (release 32) PIR (release 53) and Swissprot (release 35). The predicted proteins of the 
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present invention matching known proteins were further classified into 3 categories 
depending on the level of homology. 

[0535] The first category contains proteins of the present invention 

exhibiting more than 80% identical amino acid residues on the whole length of the 
5 matched protein. They are clearly close homologues which most probably have the 
same function or a very similar function as the matched protein. 

[0536] The second category contains proteins of the present invention 

exhibiting more remote homologies (30 to 80% over the whole protein) indicating that 
the protein of the present invention is susceptible to have a function similar to the one of 
1 0 the matched protein. 

[0537] The third category contains proteins exhibiting either high 

homology (90 to 100%) to a short domain or more remote homology (40 to 60%) to a 
larger domain of a known protein indicating that the matched protein and the protein of 
the invention may share similar features. 

15 [0538] It should be noted that the numbering of amino acids in the 

protein sequences discussed in Figures 10 to 12, and Table VIII, the first methionine 
encountered is designated as amino acid number 1. In the appended sequence listing, the 
first amino acid of the mature protein resulting from cleavage of the signal peptide is 
designated as amino acid number 1 and the first amino acid of the signal peptide is 

20 designated with the appropriate negative number, in accordance with the regulations 
governing sequence listings. 

[0539] In addition, all of the corrected amino acid sequences (SEQ ID 

NOs: 85-129 and 155-179) were scanned for the presence of known protein signatures 
and motifs. This search was performed against the Prosite 15.0 database, using the 
25 Proscan software from the GCG package. Functional signatures and their locations are 
indicated in Table VIII. 
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A) Proteins which are closely related to known proteins 

Protein of SEP ID NO: 120 (internal designation 26-44-1-B5-CL3 1) 

[0540] The protein of SEQ ID NO: 120 encoded by the extended cDNA 

SEQ ID NO: 75 isolated from ovary shows extensive homology to a human protein 

5 called phospholemman or PLM and its hornologues in rodent and canine species. PLM 

is encoded by the nucleic acid sequence of Genbank accession number U72245 and has 
the amino acid sequence of SEQ ID NO : 180. Phospholemman is a prominent plasma 
membrane protein whose phosphorylation correlates with an increase in contractility of 
myocardium and skeletal muscle. Initially described as a simple chloride channel, it has 

10 recently been shown to be a channel for taurine that acts as an osmolyte in the 
regulation of cell volume (Moorman et al, Adv Exp. Med. BioL, 442:219-228 (1998)). 

[0541] As shown by the alignment in Figure 10 between the protein of 

SEQ ID NO: 120 and PLM, the amino acid residues are identical except for positions 3 
and 5 in the 92 amino acid long matched protein. The substitution of a proline residue 

15 at position 3 par another neutral residue, serine, is conservative. In addition, the protein 
of the invention also exhibits the typical ATP1G /PLM/MAT8 PROSITE signature 
(position 27 to 40 in bold in Figure 10) for a family containing mostly proteins known 
to be either chloride channels or chloride channel regulators In addition, the protein of 
invention contains 2 short transmembrane segments from positions 1 to 21 and from 37 

20 to 57 as predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. 
Notes, 10 :685-686 (1994)). The first segment (in italic) corresponds to the signal 
peptide of PLM and the second transmembrane domains (underlined) matches the 
transmembrane region (double-underlined) shown to be the chloride channel itself 
(Chen et ai, Circ. Res., 82:367-374 (1998)). 

25 [0542] Taken together, these data suggest that the protein of SEQ ID 

NO 120 may be involved in the regulation of cell volume and in tissue contractility. 
Thus, this protein may be useful in diagnosing and/or treating several types of disorders 
including, but not limited to, cancer, diarrhea, fertility disorders, and in contractility 
disorders including muscle disorders, pulmonary disorders and myocardial disorders. 
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Proteins of SEQ ID NOs: 121 (internal designation 47-4-4-C6-CL2 3) 

[0543] The protein of SEQ ID NO: 121 encoded by the extended cDNA 

SEQ ID NO: 76 found in substantia nigra shows extensive homology with the human 
E25 protein. The E25 protein is encoded by the nucleic acid sequence of Genbank 
accession number AF038953 and has the amino acid sequence of SEQ ID NO: 181. The 
matched protein might be involved in the development and differentiation of 
haematopoietic stem/progenitor cells. In addition, it is the human homologue of a 
murine protein thought to be involved in chondro-osteogenic differentiation and 
belonging to a novel multigene family of integral membrane proteins (Deleersnijder et 
ah 1 Biol Chem., 271 :19475-19482 (1996)). 

[0544] As shown by the alignments in Figure 1 1 between the protein of 

SEQ ID NO.T21 and E25, the amino acid residues are identical except for positions 9, 
24 and 121 in the 263 amino acid long matched sequence. All these substitutions are 
conservative. In addition, the protein of invention contains one short transmembrane 
segment from positions 1 to 21 (underlined in Figure 11) matching the one predicted for 
the murine E25 protein as predicted by the software TopPred II (Claros and von Heijne, 
CABIOS applia Notes, 10 :685-686 (1994)). 

[0545] Taken together, these data suggest that the protein of SEQ ID 

NO: 121 may be involved in cellular proliferation and differentiation, and/or in 
haematopoiesis. Thus, this protein may be useful in diagnosing and/or treating several 
types of disorders including, but not limited to, cancer, hematological, chondro- 
osteogenic and embryogenetic disorders. 

Proteins of SEQ ID NO: 128 (internal designation 58-34-2-H8-CL1 3) 

[0546] The protein of SEQ ID NO: 128 encoded by the extended cDNA 

SEQ ID NO: 83 isolated from kidney shows extensive homology to the murine WW- 
domain binding protein 1 or WWBP-1. WWBP-1 is encoded by the nucleic acid 
sequence of Genbank accession number U40825 and has the amino acid sequence of 
SEQ ID NO : 182. This protein is expressed in placenta, lung, liver and kidney is 
thought to play a role in intracellular signaling by binding to the WW domain of the Yes 
protooncogene-associated protein via its so-called PY domain (Chen and Sudol, Proc. 
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Natl Acad. Set, 92 :7819-7823 (1995)). The WW - PY domains are thought to 
represent a new set of modular protein-binding sequences just like the SH3 - PXXP 
domains (Sudol et al, FEES Lett, 369 :67-71 (1995)). 

[0547] As shown by the alignments of Figure 12 between the protein of 

5 SEQ ID NO:128 and WWBP-1, the amino acid residues are identical to those of the 305 
amino acid long matched protein except for positions 53, 66, 78, 89, 92, 94, 96, 100, 
102, 106, 110, 113, 124, 128, 136, 139, 140, 142-144, 166, 168, 173, 176, 178, 181, 
182, 188, 196, 199, 201, 202, 207 and 210 of the matched protein. 68% of these 
substitutions are conservative. Indeed the histidine-rich PY domain is present in the 
10 protein of the invention (positions 82-86 in bold in Figure 12). 

[0548] Taken together, these data suggest that the protein of SEQ ED 

NO: 128 may play a role in intracellular signaling. Thus, this protein may be useful in 
diagnosing and/or treating several types of disorders including, but not limited to, 
cancer, neurodegenerative diseases, cardiovascular disorders, hypertension, renal injury 
15 and repair and septic shock. 

B) Proteins which are remotely related to proteins with known functions 

Protein of SEQ ID NO: 97 (internal designation 108-004- 5-0-G6-FL) 

[0549] The protein SEQ ID NO: 97 found in liver encoded by the 

extended cDNA SEQ ID NO: 52 shows homology to a lectin-like oxidized LDL 

20 receptor (LOX-1) found in human, bovine and murine species. Such type II proteins 

with a C-lectin-like domain, expressed in vascular endothelium and vascular-rich 
organs, bind and internalize oxidatively modified low-density lipoproteins (Sawamura 
et al y Nature^ 'hftfrril-ll, (1997)). The oxidized lipoproteins have been implicated in 
the pathogenesis of atherosclerosis, a leading cause of death in industrialized countries 

25 (see review by Parthasarathy et al, Biochem. Pharmacol. 56:279-284 (1998)). In 

addition, type II membrane proteins with a C-terminus C-type lectin domain, also 
known as carbohydrate-recognition domains, also include proteins involved in target- 
cell recognition and cell activation. 

[0550] The protein of invention has the typical structure of a type II 

30 protein belonging to the C-type lectin family. Indeed, it contains a short 31-amino-acid- 



-155- 



36US3DIV 

long N-terminal tail, a transmembrane segment from positions 32 to 52 matching the 
one predicted for human LOX-1 and a large 177-amino-acid-long C-terminal tail as 
predicted by the software TopPred II (Claros and von Heijne ? CABIOS applic. Notes, 
10 :685-686 (1994)). All six cysteines of LOX-1 C-type lectin domain are also 
5 conserved in the protein of the invention (positions 102, 113, 130, 195, 208 and 216) 

although the characteristic PROSITE signature of this family is not. The LOX-1 protein 
is encoded by the nucleic acid sequence of Genbank accession number: ABO 107 10. 

[0551] Taken together, these data suggest that the protein of SEQ ID 

NO: 97 may be involved in the metabolism of lipids and/or in cell-cell or cell-matrix 
10 interactions and/or in cell activation. Thus, this protein or part therein, may be useful in 
diagnosing and treating several disorders including, but not limited to, cancer, 
hyperlipidaemia, cardiovascular disorders and neurodegenerative disorders. 

Protein of SEP ID NO: 111 (internal designation 108-008-5-0-G12-FL) 

[0552] The protein SEQ ID NO: 111 encoded by the extended cDNA 

15 SEQ ID NO:66 shows homology to a mitochondrial protein found in Saccharomyces 
Cerevisiae (PIR:S72254) which is similar to E. Coli ribosomal protein L36. The typical 
PROSITE signature for ribosomal L36 is present in the protein of the invention 
(positions 76-102) except for a substitution of a tryptophane residue instead of a valine, 
leucine, isoleucine, methionine or asparagine residue. 

20 [0553] Taken together, these data suggest that the protein of SEQ ID 

NO: 1 1 1 may be involved in protein biosynthesis. Thus, this protein may be useful in 
diagnosing and/or treating several types of disorders including, but not limited to, 
cancer. 

Protein of SEQ ID NO: 94 (internal designation 108-00 4-5-0-D10-FL) 

25 [0554] The protein SEQ ID NO: 94 encoded by the extended cDNA 

SEQ ED NO: 49 shows remote homology to a subfamily of beta4-galactosyltransferases 
widely conserved in animals (human, rodents, cow and chicken). Such enzymes, 
usually type II membrane proteins located in the endoplasmic reticulum or in the Golgi 
apparatus, catalyzes the biosynthesis of glycoproteins, glycolipid glycans and lactose. 

30 Their characteristic features defined as those of subfamily A in Breton et al, J. 
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Biochem., 123:1000-1009 (1998) are pretty well conserved in the protein of the 
invention, especially the region I containing the DVD motif (positions 163-165) thought 
to be involved either in UDP binding or in the catalytic process itself. 

[0555] In addition, the protein of invention has the typical structure of a 

type II protein. Indeed, it contains a short 28-amino-acid-long N-terminal tail, a 
transmembrane segment from positions 29 to 49 and a large 278-amino-acid-long C- 
terminal tail as predicted by the software TopPred II (Claros and von Heijne, CABIOS 
applic. Notes, 10 :685-686 (1994)). 

[0556] Taken together, these data suggest that the protein of SEQ ID 

NO: 94 may play a role in the biosynthesis of polysaccharides, and of the carbohydrate 
moieties of glycoproteins and glycolipids and/or in cell-cell recognition. Thus, this 
protein may be useful in diagnosing and/or treating several types of disorders including, 
but not limited to, cancer, atherosclerosis, cardiovascular disorders, autoimmune 
disorders and rheumatic diseases including rheumatoid arthritis. 

Protein of SEP ID NO: 104 (internal designation 108-006-5-0-G2-FL) 

[0557] The protein of SEQ ID NO: 104 encoded by the extended cDNA 

SEQ ID NO: 59 shows homology to a neuronal murine protein NP15.6 whose 
expression is developmentally regulated. NP15.6 protein is encoded by the nucleic acid 
sequence of Genbank accession number Y08702. 

[0558] Taken together, these data suggest that the protein of SEQ ID 

NO: 104 may be involved in cellular proliferation and differentiation. Thus, this protein 
may be useful in diagnosing and/or treating several types of disorders including, but not 
limited to, cancer, neurodegenerative disorders and embryogenetic disorders. 

C) Proteins homologous to a domain of a protein with known function 

Protein of SEQ ID NO: 1 13 (internal designation 108-009-5-0-A2-FL) 

[0559] The protein of SEQ ID NO: 113 encoded by the extended cDNA 

SEQ ID NO: 68 shows extensive homology to the bZIP family of transcription factors, 
and especially to the human luman protein. (Lu et ah, Mol Cell Biol, 17 ;51 17-5126 
(1997)). The human luman protein is encoded by the nucleic acid sequence of Genbank 
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accession number : AF009368. The match include the whole bZIP domain composed 
of a basic DNA-binding domain and of a leucine zipper allowing protein dimerization. 
The basic domain is conserved in the protein of the invention as shown by the 
characteristic PROSITE signature (positions 224-237) except for a conservative 
5 substitution of a glutamic acid with an aspartic acid in position 233. The typical 

PROSITE signature for leucine zipper is also present (positions 259 to 280). Secreted 
proteins may have nucleic acid binding domain as shown by a nematode protein thought 
to regulate gene expression which exhibits zinc fingers as well as a functional signal 
peptide (Hoist and Zipfel, J. Biol Chem., 271 :16275-16733, 1996). 

10 [0560] Taken together, these data suggest that the protein of SEQ ID 

NO: 113 may bind to DNA, hence regulating gene expression as a transcription factor. 
Thus, this protein may be useful in diagnosing and/or treating several types of disorders 
including, but not limited to, cancer. 

Proteins of SEQ ID NO: 129 (internal designation 76-1 3-3- A9-CL1 1) 

15 [0561] The protein of SEQ ID NO: 129 encoded by the extended cDNA 

SEQ ID NO: 84 shows homology with part of a human seven transmembrane protein. 
The human seven transmembrane protein is encoded by the nucleic acid sequence of 
Genbank accession number Y11395. The matched protein potentially associated to 
stomatin may act as a G-protein coupled receptor and is likely to be important for the 

20 signal transduction in neurons and haematopoietic cells (Mayer et al t Biochem. Biophys. 

Acta,, 1395 :301-308 (1998)). 

[0562] Taken together, these data suggest that the protein of SEQ ID 

NO: 129 may be involved in signal transduction. Thus, this protein may be useful in 
diagnosing and/or treating several types of disorders including, but not limited to, 
25 cancer, neurodegenerative diseases, cardiovascular disorders, hypertension, renal injury 

and repair and septic shock. 

Proteins of SEQ ID NO: 95 (internal designation 108-004-5-0-E8-FL) 

[0563] The protein of SEQ ID NO: 95 encoded by the extended cDNA 

SEQ ID NO: 50 exhibit the typical PROSITE signature for amino acid permeases 
30 (positions 5 to 66) which are integral membrane proteins involved in the transport of 
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amino acids into the cell In addition, the protein of invention has a transmembrane 
segment from positions 9 to 29 as predicted by the software TopPred II (Claros and von 
Heijne, CABIOS applic. Notes, 10 :685-686 (1994)). 

[0564] Taken together, these data suggest that the protein of SEQ ID 

5 NO: 95 may be involved in amino acid transport. Thus, this protein may be useful in 
diagnosing and/or treating several types of disorders including, but not limited to, 
cancer, aminoacidurias, neurodegenerative diseases, anorexia, chronic fatigue, coronary 
vascular disease, diphtheria, hypoglycemia, male infertility, muscular and myopathies. 

[0565] As discussed above, the extended cDNAs of the present invention 

10 or portions thereof can be used for various purposes. The polynucleotides can be used to 
express recombinant protein for analysis, characterization or therapeutic use; as markers 
for tissues in which the corresponding protein is preferentially expressed (either 
constitutively or at a particular stage of tissue differentiation or development or in disease 
states); as molecular weight markers on Southern gels; as chromosome markers or tags 

15 (when labeled) to identify chromosomes or to map related gene positions; to compare with 
endogenous DNA sequences in patients to identify potential genetic disorders; as probes to 
hybridize and thus discover novel, related DNA sequences; as a source of information to 
derive PCR primers for genetic fingerprinting; for selecting and making oligomers for 
attachment to a "gene chip" or other support, including for examination for expression 

20 patterns; to raise anti-protein antibodies using DNA immunization techniques; and as an 
antigen to raise anti-DNA antibodies or elicit another immune response. Where the 
polynucleotide encodes a protein which binds or potentially binds to another protein (such 
as, for example, in a receptor-ligand interaction), the polynucleotide can also be used in 
interaction trap assays (such as, for example, that described in Gyuris et al., Cell 75:791- 

25 803 (1993)) to identify polynucleotides encoding the other protein with which binding 

occurs or to identify inhibitors of the binding interaction. 

[0566] The proteins or polypeptides provided by the present invention can 

similarly be used in assays to determine biological activity, including in a panel of multiple 
proteins for high-throughput screening; to raise antibodies or to elicit another immune 
30 response; as a reagent (including the labeled reagent) in assays designed to quantitatively 
determine levels of the protein (or its receptor) in biological fluids; as markers for tissues 
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in which the corresponding protein is preferentially expressed (either constitutively or at a 
particular stage of tissue differentiation or development or in a disease state); and, of 
course, to isolate correlative receptors or ligands. Where the protein binds or potentially 
binds to another protein (such as, for example, in a receptor-ligand interaction), the protein 
5 can be used to identify the other protein with which binding occurs or to identify inhibitors 

of the binding interaction. Proteins involved in these binding interactions can also be used 
to screen for peptide or small molecule inhibitors or agonists of the binding interaction. 

[0567] Any or all of these research utilities are capable of being developed 

into reagent grade or kit format for commercialization as research products. 

10 [0568] Methods for performing the uses listed above are well known to 

those skilled in the art. References disclosing such methods include without limitation 
"Molecular Cloning; A Laboratory Manual", 2d ed., Cole Spring Harbor Laboratory Press, 
Sambrook, J., E.F. Fritsch and T. Maniatis eds., 1989, and "Methods in Enzymology; 
Guide to Molecular Cloning Techniques", Academic Press, Berger, S.L. and A.R. Kimmel 

15 eds., 1987. 

[0569] Polynucleotides and proteins of the present invention can also be 

used as nutritional sources or supplements. Such uses include without limitation use as a 
protein or amino acid supplement, use as a carbon source, use as a nitrogen source and use 
as a source of carbohydrate. In such cases the protein or polynucleotide of the invention 
20 can be added to the feed of a particular organism or can be administered as a separate solid 
or liquid preparation, such as in the form of powder, pills, solutions, suspensions or 
capsules. In the case of microorganisms, the protein or polynucleotide of the invention can 
be added to the medium in or on which the microorganism is cultured. 

[0570] Although this invention has been described in terms of certain 

25 preferred embodiments, other embodiments which will be apparent to those of ordinary 
skill in the art in view of the disclosure herein are also within the scope of this invention. 
Accordingly, the scope of the invention is intended to be defined only by reference to the 
appended claims. All documents cited herein are incorporated herein by reference in their 
entirety. 

30 
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TABLE I 



SEQ ID NO. ra 

X I CSCIll. >rV|J 11II***H.1<JII 


Provisional Application Disclosing Sequence 


SEQ ID NO. in 
Provisional 
Application 


40 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


40 


41 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


41 


42 


U.S. Application No. 60/099,273, filed on September 4, 1998 


62 


43 


U.S. Application No. 60/099,273, filed on September 4, 1998 


47 


44 


U.S. Application No. 60/099,273, filed on September 4, 1998 


43 


45 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


42 


46 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


43 


47 


U.S. Application No. 60/099,273, filed on September 4, 1998 


45 


48 


U.S. Application No. 60/099,273, filed on September 4, 1998 


44 


49 


U.S. Application No. 60/099,273, filed on September 4, 1998 


50 


50 


U.S. Application No. 60/099,273, filed on September 4, 1998 


49 


51 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


44 


52 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


45 


53 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


46 


54 


U.S. Application No. 60/099,273, filed on September 4, 1998 


51 


55 


U.S. Application No. 60/099,273, filed on September 4, 1998 


59 


56 


U.S. Application No. 60/099,273, filed on September 4, 1998 


61 


57 


U.S. Application No. 60/099,273, filed on September 4, 1998 


53 


58 


U.S. Application No. 60/099,273, filed on September 4, 1998 


52 


59 


U.S. Application No. 60/099,273, filed on September 4, 1998 


54 


60 


U.S. Application No. 60/096,116, filed on August 10, 1998 


47 


61 


U.S. Application No. 60/099,273, filed on September 4, 1998 


63 


62 


U.S. Application No. 60/099,273, filed on September 4, 1998 


46 


63 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


48 


64 


U.S. Application No. 60/099,273, filed on September 4, 1998 


58 


65 


U.S. Application No. 60/099,273, filed on September 4, 1998 


56 


66 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


49 


67 


U.S. Application No. 60/099,273, filed on September 4, 1998 


57 


68 


U.S. Application No. 60/099,273, filed on September 4, 1998 


55 


69 


U.S. Application No. 60/099,273, filed on September 4, 1998 


42 


70 


U.S. Application No. 60/099,273, filed on September 4, 1998 


41 


71 


U.S. Application No. 60/099,273, filed on September 4, 1998 


48 


72 


U.S. Application No. 60/099,273, filed on September 4, 1998 


60 


73 


U.S. Application No. 60/096,1 16, filed on August 10, 1998 


50 


74 


U.S. Application No. 60/099,273, filed on September 4, 1998 


40 


75 


U.S. Application No. 60/074,121, filed on February 9, 1998 


42 


76 


U.S. Application No. 60/074,121, filed on February 9, 1998 


56 


77 


U.S. Application No. 60/074,121, filed on February 9, 1998 


57 
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SEQ ID NO. in 
Present Application 


rrOVlMuildl /\ppilC<illUU JLSlM^lUMIlg OcljUcutc 


SEQ ID NO. m 

x TV V IMUU al 

Application 


78 


U.S. Application No. 60/081,563, filed on April 13, 1998 


84 


79 


U.S. Application No. 60/081,563, filed on April 13, 1998 


69 


80 


U.S. Application No. 60/074,121, filed on February 9, 1998 


62 


81 


U.S. Application No. 60/081,563, filed on April 13, 1998 


79 


82 


U.S. Application No. 60/074,121, filed on February 9, 1998 


64 


83 


U.S. Application No. 60/081,563, filed on April 13, 1998 


51 


84 


U.S. Application No. 60/074,121, filed on February 9, 1998 


71 


130 


U.S. Application No. 60/081,563, filed on April 13, 1998 


40 


131 


U.S. Application No. 60/081,563, filed on April 13, 1998 


41 


132 


U.S. Application No. 60/081,563, filed on April 13, 1998 


42 


133 


U.S. Application No. 60/081,563, filed on April 13, 1998 


43 


134 


U.S. Application No. 60/081,563, filed on April 13, 1998 


44 


135 


U.S. Application No. 60/081,563, filed on April 13, 1998 


45 


136 


U.S. Application No. 60/081,563, filed on April 13, 1998 


46 


137 


U.S. Application No. 60/081,563, filed on April 13, 1998 


47 


138 


U.S. Application No. 60/081,563, filed on April 13, 1998 


48 


139 


U.S. Application No. 60/081,563, filed on April 13, 1998 


49 


140 


U.S. Application No. 60/081,563, filed on April 13, 1998 


50 


141 


U.S. Application No. 60/081,563, filed on April 13, 1998 


53 


142 


U.S. Application No. 60/081,563, filed on April 13, 1998 


54 


143 


U.S. Application No. 60/081,563, filed on April 13, 1998 


55 


144 


U.S. Application No. 60/081,563, filed on April 13, 1998 


56 


145 


U.S. Application No. 60/081,563, filed on April 13, 1998 


57 


146 


U.S. Application No. 60/081,563, filed on April 13, 1998 


58 


147 


U.S. Application No. 60/081,563, filed on April 13, 1998 


59 


148 


U.S. Application No. 60/081,563, filed on April 13, 1998 


60 


149 


U.S. Application No. 60/081,563, filed on April 13, 1998 


61 


150 


U.S. Application No. 60/081,563, filed on April 13, 1998 


62 


151 


U.S. Application No. 60/081,563, filed on April 13, 1998 


63 


152 


U.S. Application No. 60/081,563, filed on April 13, 1998 


64 


153 


U.S. Application No. 60/081,563, filed on April 13, 1998 


65 


154 


U.S. Application No. 60/081,563, filed on April 13, 1998 


66 
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TABLE II 



Parameters used for each step of EST analysis 





Search Characteristics 


Selection Characteristics 


Step 


Program 


Strand 


Parameters 


Identity (%)) 


Length 
(bp) 


Miscellaneous 


Blastn 


both 


S=61 X=16 


90 


17 


tRNA 


Fasta 


both 




80 


60 


rRNA 


Blastn 


both 


S=108 


80 


40 


mtRNA 


Blastn 


both 


S=108 


80 


40 


Procaryotic 


Blastn 


both 


S=144 


90 


40 


Fungal 


Blastn 


both 


S=144 


90 


40 


Alu 


fasta* 


both 




70 


40 


LI 


Blastn 


both 


S=72 


70 


40 


Repeats 


Blastn 


both 


S=72 


70 


40 


Promoters 


Blastn 


top 


S=54X=16 


90 


151 


Vertebrate 


fasta* 


both 


S=108 


90 


30 


ESTs 


Blatsn 


both 


S=108X=16 


90 


30 


Proteins 


blastxn 


top 


E=0.001 







* use "Quick Fast" Database Scanner 

± alignment further constrained to begin closer than lObp to ESTXS 1 end 
11 using BLOSUM62 substitution matrix 
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TABLE III 



Parameters used for each step of extended cDNA analysis 





Search 

characteristics 


Selection characteristics 


Step 


Program 


Strand 


Parameters 


Identity 
(%) 


Length 

(bp) 


Comments 


miscellaneou 

s* 


FASTA 


both 




90 


15 




tRNA 5 


FASTA 


both 




80 


90 




rRNA* 


BLASTN 


both 


S=108 


80 


40 




mtRNA 1 


JdJUtYo 1 IN 


DO LI1 


C~1 OS 

o — JL uo 


80 


40 




~™ 

Procaryotic 


BLASTN 


both 


S=144 


90 


40 




JT Ullgcu 


BLASTN 


both 


S=144 


90 


40 




Alu* 


BLASTN 


both 


S=72 


70 


40 


max 5 matches, 
masking 




BLASTN 


both 


S=72 


70 


40 


max 5 matches, 
masking 




Repeats 


BLASTN 


both 


S=72 


70 


40 


masking 


roiyA 


BLAST2 
N 


top 


W=6,S=10,E=10 

AA 


90 


8 


in the last 20 
nucleotides 


Polyadenylati 
on signal 




top 


A ATA A A al 
MiAAA aiJ 


owing 1 mismatch 


m ine ju 
nucleotides 
preceding the 5 1 
end of the polA 


Vertibrate* 


BLASTN 

then 
FASTA 


both 




90 then 70 


30 


first BLASTN and 
then FASTA on 
matching 
sequences 


ESTs* 


BLAST2 
N 


both 




90 


30 




Geneseq 


BLASTN 


both 


W=8, B=10 


90 


30 




OKF 


BLASTP 


top 


W=8, B=10 






on ORF proteins, 
max 10 matches 


Proteins* 


BLASTX 


top 


E=0.001 


70 


30 





s steps common to EST analysis and using the same algorithms and parameters 
* steps also used in EST analysis but with different algorithms and/or parameters 
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TABLE IV 



Id 


FCS 
Location 


SigPep 
Location 


Mature 
Polypeptide 
Location 


Stop 
Codon 
Location 


PolyA Signal 
Location 


PolyA Site 
Location 


40 


35 through 
568 


35 through 100 


101 through 
568 


569 


667 through 
672 


685 through 
699 


41 


68 through 
337 


68 through 124 


125 through 
337 


338 


462 through 
467 


482 through 
497 


42 


39 through 
413 


39 through 83 


84 through 
413 


414 


566 through 
571 


583 through 
598 


43 


235 through 
642 


235 through 
336 


337 through 
642 


643 


1540 through 
1545 


1564 through 
1579 


44 


42 through 
755 


42 through 200 


201 through 
755 


756 


860 through 
865 


878 through 
893 


45 


23 through 
340 


23 through 235 


236 through 
340 


341 


611 through 
616 


629 through 
644 


46 


12 through 
380 


12 through 263 


264 through 
380 


381 




523 through 
538 


47 


8 through 232 


8 through 154 


155 through 
232 


233 




737 through 
752 


48 


183 through 
422 


183 through 
302 


303 through 
422 


423 


505 through 
510 


523 through 
537 


49 


24 through 
1004 


24 through 170 


171 through 
1004 


1005 


" 


1586 through 
1602 


50 


80 through 
784 


80 through 139 


140 through 
784 


785 


910 through 
915 


933 through 
948 


51 


67 through 
222 


67 through 159 


160 through 
222 


223 


~ 


673 through 
687 


52 


46 through 
732 


46 through 186 


187 through 
732 


733 


781 through 
786 


806 through 
821 


53 


81 through 
356 


81 through 152 


153 through 
356 


357 


406 through 
411 


429 through 
445 


54 


72 through 
1346 


72 through 140 


141 through 
1346 


1347 


1482 through 
1487 


1502 through 
1517 


55 


194 through 
454 


194 through 
379 


380 through 
454 


455 




1545 through 
1560 


56 


48 through 
494 


48 through 347 


348 through 

ACiA 

494 


495 


1031 through 
103o 


1051 through 
luoo 


57 


111 through 
671 


1 1 1 through 
215 


216 through 
671 


672 


990 through 
995 


1045 through 
1061 


58 


5 through 373 


5 through 82 


83 through 
373 


374 


1986 through 
1991 


2010 through 
2025 


59 


14 through 
472 


14 through 319 


320 through 
472 


473 


555 through 
560 


576 through 
591 


60 


2 through 217 




2 through 217 


218 


489 through 


529 through 
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Id 


FCS 
Location 


SigPep 
Location 


Mature 
Polypeptide 
Location 


Stop 
Codon 
Location 


PoIyA Signal 
Location 


PoIyA Site 
Location 












494 


544 


61 


51 through 
575 


51 through 110 


111 through 
575 


576 


1653 through 
1658 


1674 through 
1689 


62 


69 through 
977 


69 through 128 


129 through 
977 


978 


1076 through 
1081 


1096 through 
1111 


63 


44 through 
238 


44 through 160 


161 through 
238 


239 


443 through 
448 


540 through 
554 


64 


1 14 through 
524 


114 through 
164 


165 through 
524 


525 


1739 through 
1744 


1758 through 
1773 


65 


26 through 
487 


26 through 64 


65 through 
487 


488 


883 through 
888 


901 through 
917 


66 


80 through 
388 


80 through 187 


188 through 
388 


389 


609 through 
614 


627 through 
641 


67 


186 through 
443 


186 through 
407 


408 through 
443 


444 


827 through 
832 


839 through 
854 


68 


75 through 
1259 


75 through 
1004 


1005 through 
1259 


1260 


1536 through 
1541 


1553 through 
1568 


69 


98 through 
376 


98 through 151 


152 through 
376 


377 


471 through 
476 


491 through 
506 


70 


72 through 
254 


72 through 134 


135 through 
254 


255 


506 through 
511 


528 through 
542 


71 


148 through 
1140 


148 through 
240 


241 through 
1140 


1141 


1590 through 
1595 


1614 through 
1629 


72 


109 through 
738 


109 through 
405 


406 through 
738 


739 


1633 through 
1638 


1650 through 
1665 


73 


55 through 
291 


55 through 255 


256 through 
291 


292 


390 through 
395 


410 through 
425 


74 


25 through 
276 


~ 


25 through 
276 


277 


508 through 
513 


533 through 
546 


75 


32 through 
307 


32 through 91 


92 through 
307 


308 


452 through 
457 


472 through 
485 


76 


46 through 
675 


46 through 87 


88 through 
675 


676 


1363 through 
1368 


1382 through 
1394 


77 


329 through 
943 


329 through 
745 


746 through 
943 


944 




1322 through 
1333 


78 


27 through 

281 


27 through 77 


78 through 
281 


282 




- 


79 


61 through 
405 


61 through 213 


214 through 
405 


406 


675 through 
680 


692 through 
703 


80 


137 through 
379 


137 through 
229 


230 through 
379 


380 


728 through 
733 


755 through 
768 


81 


37 through 


37 through 153 


154 through 


742 


969 through 


994 through 
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Id 


FCS 
Location 


SigPep 
Location 


Mature 
Polypeptide 
Location 


Stop 
Codon 
Location 


PolyA Signal 
Location 


PolyA Site 
Location 




741 




741 




974 


1007 


82 


80 through 
265 


80 through 142 


143 through 
265 


266 


491 through 
496 


517 through 
527 


83 


612 through 
644 


- 


612 through 
644 


645 


829 through 
834 


850 through 
861 


84 


61 through 
228 


61 through 162 


163 through 
228 


229 


208 through 
213 


- 


130 


15 through 
311 


15 through 110 


111 through 
311 


312 


507 through 
512 


531 through 
542 


131 


50 through 
529 


50 through 130 


131 through 
529 


530 


877 through 
882 


899 through 
909 


132 


240 through 
416 


240 through 
305 


306 through 
416 


417 


1117 through 
1122 


11 39 through 
1149 


133 


1 1 1 through 
446 


111 through 
254 


255 through 
446 


447 


890 through 
895 


909 through 
921 


134 


123 through 
455 


123 through 
290 


291 through 
455 


456 


886 through 
891 


904 through 
916 


135 


2 through 433 


2 through 232 


233 through 
433 


434 


488 through 
493 


510 through 
520 


136 


34 through 
363 


34 through 87 


88 through 
363 


364 


536 through 
541 


558 through 
568 


137 


50 through 
286 


50 through 157 


158 through 
286 


287 


385 through 
390 


405 through 
416 


138 


50 through 
637 


50 through 151 


152 through 
637 


638 




1277 through 
1289 


139 


72 through 
602 


72 through 125 


126 through 
602 


603 


- 


704 through 
715 


140 


120 through 
434 


120 through 
185 


186 through 
434 


435 


899 through 
904 


918 through 
931 


141 


4 through 447 


4 through 147 


148 through 
447 


448 


858 through 
863 


880 through 
891 


142 


28 through 
804 


28 through 96 


97 through 
804 


805 




806 through 
817 


143 


27 through 
359 


27 through 212 


213 through 
359 


360 


988 through 
993 


1009 through 
1020 


144 


25 through 
957 


25 through 93 


94 through 
957 


958 


1368 through 
1373 


1388 through 
1399 


145 


47 through 
319 


47 through 226 


227 through 
319 


320 




656 through 
666 


146 


80 through 
940 


80 through 130 


131 through 
940 


941 


1101 through 
1106 


11 19 through 
1130 


147 


146 through 


146 through 


293 through 


458 


442 through 


465 through 
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Id 


res 

Location 


SigPep 
Location 


Mature 
Polypeptide 
Location 


Stop 
Codon 
Location 


PolyA Signal 
Location 


PolyA Site 
Location 




457 


292 


457 




447 


475 


148 


100 through 
351 


100 through 
207 


208 through 
351 


352 


- 


940 through 
949 


149 


177 through 
569 


177 through 
236 


237 through 
569 


570 


- 


931 through 
939 


150 


67 through 
459 


67 through 135 


136 through 
459 


460 


856 through 
861 


875 through 
887 


151 


65 through 
1069 


65 through 112 


113 through 
1069 


1070 


1978 through 
1983 


1999 through 
2010 


152 


70 through 
321 


70 through 234 


235 through 
321 


322 


364 through 
369 


375 through 
387 


153 


38 through 
877 


38 through 91 


92 through 
877 


878 


947 through 
952 


974 through 
983 


154 


51 through 
470 


51 through 203 


204 through 
470 


471 


1585 through 
1590 


1604 through 
1614 
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TABLE V 



Id 


Full Length Polypeptide 
Location 


Signal Peptide 
Location 


Mature Polypeptide 
Location 


85 


-22 through 156 


-22 through -1 


1 through 156 


86 


-19 through 71 


-19 through -1 


1 through 71 


87 


-15 through 110 


-15 through -1 


1 through 110 


88 


-34 through 102 


-34 through -1 


1 through 102 


89 


-53 through 185 


-53 through -1 


1 through 185 


90 


-71 through 35 


-71 through -1 


1 through 35 


91 


-84 through 39 


-84 through -1 


1 through 39 


92 


-49 through 26 


-49 through -1 


1 through 26 


93 


-40 through 40 


-40 through -1 


1 through 40 


94 


-49 through 278 


-49 through -1 


1 through 278 


95 


-20 through 215 


-20 through -1 


1 through 215 


96 


-31 through 21 


-31 through -1 


1 through 21 


97 


-47 through 182 


-47 through -1 


1 through 182 


98 


-24 through 68 


-24 through -1 


1 through 68 


99 


-23 through 402 


-23 through -1 


1 through 402 


100 


-62 through 25 


-62 through -1 


1 through 25 


101 


-100 through 49 


-100 through -1 


1 through 49 


102 


-35 through 152 


-35 through -1 


1 through 152 


103 


-26 through 97 


-26 through -1 


1 through 97 


104 


-102 through 51 


-102 through -1 


1 through 5 1 


105 


1 through 72 


- 


1 through 72 


106 


-20 through 155 


-20 through -1 


1 through 155 


107 


-20 through 283 


-20 through -1 


1 through 283 


108 


-39 through 26 


-39 through -1 


1 through 26 


109 


-17 through 120 


-17 through -1 


1 through 120 


110 


-13 through 141 


-13 through -1 


1 through 141 


111 


-36 through 67 


-36 through -1 


1 through 67 


112 


-74 through 12 


-74 through -1 


1 through 12 


113 


-310 through 85 


-310 through -1 


1 through 85 


114 


-18 through 75 


-18 through -1 


1 through 75 


115 


-21 through 40 


-21 through -1 


1 through 40 


116 


-31 through 300 


-31 through -1 


1 through 300 


117 


-99 through 111 


-99 through -1 


1 through 1 1 1 


118 


-67 through 12 


-67 through -1 


1 through 12 


119 


1 through 84 




1 through 84 


120 


-20 through 72 


-20 through -1 


1 through 72 
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Id 


Full Length Polypeptide 
Location 


Signal Peptide 
Location 


Mature Polypeptide 
jLocauon 


121 


-14 through iyo 


-14 xnrougn -1 


1 tVirriiirrV* 1 

i inrougn ivo 


ill 


-l3y tnrougn 00 


inrougn -1 


1 through 66 


123 


-1 / tnrougn 68 


-17 through -1 


i tnrougn oo 


1 'O. A 

124 


-j 1 tnrougn 64 


-51 through -1 


i tnrougn 04 


125 


-31 tnrougn jU 


-3 1 through -1 


1 tnrougn !>U 


1 O /" 

126 


-39 tnrougn 19o 


-39 through -1 


1 tnrougn lyo 


127 


-21 tnrougn 41 


-Zl through -l 


1 tnrougn 4 1 


128 


1 tnrougn 1 1 




1 through 1 1 


129 


-34 through 22 


-34 through -1 


1 through ZZ 


155 


-32 through 0/ 


-31 through -1 


1 through o/ 


156 


-27 through 133 


-27 through -1 


1 through 133 


157 


-22 through 37 


-22 through -1 


1 through 3 / 


158 


-48 through 64 


-4o tnrougn -1 


1 through 64 


159 


-56 through 55 


-56 through -1 


1 through 55 


160 


■"7 —7 j-t_ „ „ „ „ O 

-77 through 67 


-77 through -1 


1 through 67 


1 y 1 

lol 


-18 through 92 


1 O -4-1**. A -■ j*nJ1« *t 

-18 through -1 


1 through 92 


162 


-36 through 43 


-36 through -1 


1 through 43 


163 


-34 through 162 


-34 through -1 


1 j.1 i_ 1 /f<-» 

1 through 162 


164 


1 O j_1 1 1 CC\ 

-18 through 159 


-18 through -1 


1 through 159 


165 


A A :1 TOO 

-22 through 83 


n j.i_ i_ 1 
-22 through -1 


1 through 83 


166 


-48 through 100 


-48 through -1 


1 through 100 


167 


-23 through 236 


-23 through -1 


1 through 236 


168 


-62 through 49 


/-^v j_1 1 1 

-62 through -l 


1 through 49 


169 


-23 through 288 


-23 through -1 


1 j.1 „ 1 ^OO 

1 through 288 


170 


-60 through 3 1 


-60 through - I 


1 j.1 1 O 1 

l through 3 1 


171 


-17 through 270 


-17 through -1 


1 through 270 


172 


-49 through 55 


-49 through -1 


1 through 55 


173 


-36 through 48 


-36 through -1 


1 through 48 


1 /4 


-2U tnrougn 1 1 1 


-zu tnrougn - 1 


1 through ill 


175 


-23 through 108 


-23 through -1 


1 through 108 


176 


-16 through 319 


-16 through -1 


1 through 319 


177 


-55 through 29 


-55 through -1 


1 through 29 


178 


-18 through 262 


-18 through -1 


1 through 262 


179 


-51 through 89 


-51 through -1 


1 through 89 
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TABLE VI 



Id 


Collection refs 


Deposit Name 


40 


ATCC# 98921 


SignalTag 121-144 


41 


ATCC# 98921 


SignalTag 121-144 


42 


ATCC# 98919 


SignalTag 145-165 


43 


ATCC# 98919 


SignalTag 145-165 


44 


ATCC# 98919 


SignalTag 145-165 


45 


ATCC# 98921 


SignalTag 121-144 


46 


ATCC# 98921 


SignalTag 121-144 


47 


ATCC# 98919 


SignalTag 145-165 


48 


ATCC# 98919 


SignalTag 145-165 


49 


ATCC# 98919 


SignalTag 145-165 


50 


ATCC# 98919 


SignalTag 145-165 


51 


ATCC# 98921 


SignalTag 121-144 


52 


ATCC# 98921 


SignalTag 121-144 


53 


ATCC# 98921 


SignalTag 121-144 


54 


ATCC# 98919 


SignalTag 145-165 


55 


ATCC# 98919 


SignalTag 145-165 


56 


ATCC# 98919 


SignalTag 145-165 


57 


ATCC# 98919 


SignalTag 145-165 


58 


ATCC# 98919 


SignalTag 145-165 


59 


ATCC# 98919 


SignalTag 145-165 


60 


ATCC# 98921 


SignalTag 121-144 


61 


ATCC# 98919 


SignalTag 145-165 


62 


ATCC# 98919 


SignalTag 145-165 


63 


ATCC# 98921 


SignalTag 121-144 


64 


ATCC# 98919 


SignalTag 145-165 


65 


ATCC# 98919 


SignalTag 145-165 


66 


ATCC# 98921 


SignalTag 121-144 


67 


ATCC# 98919 


SignalTag 145-165 


68 


ATCC# 98919 


SignalTag 145-165 


69 


ATCC# 98919 


SignalTag 145-165 


70 


ATCC# 98919 


SignalTag 145-165 


71 


ECACC# XXXX 


Signal Tag 28011 999 


72 


ECACC# XXXX 


Signal Tag 28011 999 


73 


ECACC#XXXX 


Signal Tag 28011 999 


74 


ECACC#XXXX 


Signal Tag 28011 999 


75 


ECACC#XXXX 


SignalTag 28011 999 


76 


ECACC#XXXX 


Signal Tag 28011 999 


77 


ECACC# XXXX 


Signal Tag 2801 1 999 


78 


ECACC# XXXX 


Signal Tag 28011 999 


79 


ECACC#XXXX 


Signal Tag 28011 999 


80 


ECACC#XXXX 


Signal Tag 28011 999 


81 


ECACC#XXXX 


Signal Tag 28011 999 


82 


ECACC# XXXX 


Signal Tag 28011 999 


83 


ECACC#XXXX 


Signal Tag 28011 999 


84 


ECACC#XXXX 


Signal Tag 28011 999 
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TABLE VII 



Internal designation 


lu 


Type of sequence 


1 AO AAO C H D1 CT 


40 


TYNT A 


1 0o-002o-0-r j-rL 


A 1 

41 


LAIN A 


1 AO AAO C A f/t T7T 

1 Uo-UUZ- J -0-r 4-r L 


42 


TYM A 


1 AO AAO C A A O CT 

10o-003-5-0-A5-rL 


43 


JJJNA 


108-003 -5 -0-D2-rL 


A A 

44 


DIN A 


1 AO AA"> C A f? C T7T 

1 06-003 -5-0-E5 -r L 


45 


DNA 


1 AO AAI C A TJT^ "CT 

108-003 -5 -O-HZ-rL 


46 


T\-\T A 

DJNA 


1 AO A A /I C A T~) ^7 T7T 

108-004-5-0-B7-rL 


4/ 


DJNA 


1 AO AA/f C A r^Q CT 

108-004-5-0-Go-JrL 


/I o 

4o 


TYN.T A 
DJNA 


1 AO AA/t C A T~\ 1 A T?T 

1 08-004-5 -0-D 1 0-r L 


A A 

4y 


T\XT A 

DJNA 


1 AO A A A C A T7 0 T7T 

1 08-004-5-0-E8-r L 


C A 

50 


TWT A 

DJNA 


1 AO AA/t C A X2< T7T 

1 0 8-004-5 -0-r >rL 


51 


"T\T\T A 

DJNA 


1AO AA/f C A f T?T 


52 


r\\T A 

DJNA 


1 AO AA£* C A r> 1 1 T7T 

1 08-005-5 -0-B 1 


53 


TVXT A 


i AO A AC C A /"M T7T 

1 08-005 -5 -0-Cl-rL 


54 


DNA 


1 AO AAC C A "CI 1 T?T 

108-005-5-O-F1 1-FL 


55 


TWT A 

DNA 


1 AO AAC C AT - '/' TT 

108-005-5-0-F6-FL 


56 


a\t a 

DNA 


108-006-5-0-C2-FL 


57 


T»-\ T A 

DNA 


1AO AA A C A T/T T7T 

108-006-5-0-Eo-rL 


c o 

58 


A\T A 

DNA 


1 08-006-5 -0-G2-FL 


59 


a-\t a 

DNA 


1 AO AAA £" A / — 1 /( T'T 

108-006-5-0-G4-FL 


^TA 

60 


DNA 


108-008-5-0-A6-FL 


61 


DNA 


1AO A AO C A AO T'T 

108-008-5-0-A8-FL 


62 


DNA 


108-008-5-0-C10-FL 


63 


DNA 


108-008-5-0-E6-FL 


64 


DNA 


1 AD AAO iT A T f TT 

1 08-008-5 -0-F6-FL 


65 


DNA 


1 08-008-5 -0-G 1 2-FL 


66 


DNA 


■i f\d f\f\ CI i~ ' A <* — A A TIT 

108-008-5-0-G4-FL 


67 


DNA 


i r\Ci AAA r~ A A ^ T^T 

1 08-009-5 -0-A2-FL 


68 


DNA 


-1 A O A 1 O ^ A / — I 1 r\ T*T 

108-013-5-0-C12-FL 


/"A 

69 


DNA 


108-013-5-0-G11-FL 


70 


DNA 


108-003 -5 -0-E4-FL 


71 


DNA 


108-005-5-0-D6-FL 


72 


DNA 


i n r> AAA c a /■ — t -~> Tr~>T 

108-008-5-0-G3-FL 


73 


DNA 


108-013-5-0-B5-FL 


74 


DNA 


26-44-1 -B5-CL3 1 


75 


DNA 


47-4-4-C6-CL2 3 


76 


TW.T A 

DNA 


47-40-4-G9-CL1 1 


77 


DNA 


48-25-4-D8-CL1 7 


78 


DNA 


48-28-3-A9-CL0 1 


79 


DNA 


51-25-1-A2-CL3 1 


80 


DNA 


55-10-3-F5-CL0_3 


81 


DNA 


57-19-2-G8-CL1 3 


82 


DNA 


58-34-2-H8-CLl_3 


83 


DNA 


76-13-3-A9-CLl_l 


84 


DNA 


78-7-2-B8-FL1 


130 


DNA 
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Internal designation 


lu 


Tvnp nf seauence 


t-i r> A TT\ f~?T 1 

77-8-4-F9-rLI J 


. J I 


DNA 


58-8-1 -F2-FL2 




DNA 


77-13-1-A7-FL2 




DNA 


47-2-3-G9-rLl 


1 34 


DNA 


33-75-4-H7-FL1 


1 3S 


DNA 


51-41-1-F10-FL1 




DNA 


48-51-4-C11-FL1 


1 ^7 


DNA 


33-58-3-C8-FL1 


1 1% 


DNA 


76-20-4-C11-FL1 


1 

1 J7 


DNA 


76-28-3-A12-FL1 


1 AO 


DNA 


76-25-4-Fll-FLi 


1 A1 


DNA 


58-20-4-G7-FL1 


1 A? 


DNA 


33-54-1 -B9-FL1 


1 A1 


DNA 


76-20-3 -Hl-F LI 


1 AA 


DNA 


47-20-2-G3-FL1 


1 A^ 


DNA 


78-25-1-H11-FLI 


1 A£ 


DNA 


78-6-2-B10-tLl 


1 A7 


DNA 


58_49-3-G10-FLl 


1 A8 
14o 


DNA 


78-21-1-B7-FL1 


1 AO 


DNA 


57-28-4-B12-FLI 


i jV 


DNA 


33-77-4-E2-PL1 




DNA 


58-19-3-D3-FL2 


1 ^9 
1 


DNA 


37-7-4-E7-FL1 


1 ^ 


DNA 


60-14-2-HlO-FLl 


1 ^/t 
1 M 


DNA 


1 08-002-5 -0-Bl-FL 




PRT 


1 08-002-5 -0-F3-FL 


oD 


PRT 


108-002-5-0-F4-FL 


o / 


PRT 


108-003-5-0-A8-FL 


00 


PRT 

1 XV. X 


108-003-5-0-D2-FL 


CO 


PRT 

X XX. X 


108-003-5-0-E5-FL 


on 


PRT 


108-003-5-0-H2-FL 


Q1 

y 1 


PRT 

x xx. x 


108-004-5-0-B7-FL 


oo 


PRT 

x xv x 


108-004-5-0-C8-FL 




PRT 

X XV X 


1 08-004-5 -0-DlO-FL 


OA 


PRT 


108-004-5-0-E8-FL 


yj 


PRT 

X XV X 


108-004-5-0-F5-FL 


yO 


PRT 


1 08-004-5-0-G6-FL 


07 


PRT 


108-005-5-0-B11-FL 


OQ 


PRT 

X XV X 


108-005-5-0-Cl-JeL 


00 

yy 


PRT 


10R-005-5-0-F11-FL 


i nn 

1UU 


PRT 

x xv x 


108-005-5-0-F6-tL 


1 m 

iUl 


PRT 


108-006-5-0-C2-FL 


1 

lUZ 


PRT 

X IV i 


108-006-5-0-E6-FL 


1 m 


PRT 

X XV X 


1 OR OO6-5-0-G2-FL 


104 


PRT 


108-006-5-0-G4-FL 


105 


PRT 


108-008-5-0-A6-FL 


ioe 


; PRT 


1 08-008-5 -0-A8-FL 


10; 


I PRT 


1O8-008-5-0-C10-FL 


\0i 


* PRT 


108-008-5-0-E6-FL 


10? 


) PRT 
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Internal designation 


Id 


Type of sequence 


1 AO AAO C A T?^ T7T 

108-008-5-0-ro-rL 


1 1 A 

110 


PR1 


1 08-008-5 -0-G 1 2-FL 


111 


PRT 


1 ao aao c r\ ^ a rr 

l084)08-5-0-Cj4-rL 


1 12 


PRT 


1AO AAA C A AO T?T 

1 08-009-5 -U-Az-r L 


1 1 1 

113 


PRi 


1 A O A1 O C A 1 O T'T 

108-013-5 -0-C 1 2 -FL 


1 1 A 

114 


PRT 


1/\0 A 1 O A /— < 1 1 T'T 

1 08-01 3 -5 -0-G 1 1-FL 


115 


PRT 


108-003-5-0-E4-FL 


116 


PRT 


108-005-5-0-D6-FL 


117 


PRT 


108-008-5-0-G3-FL 


118 


PRT 


108-013-5-0-B5-FL 


119 


PRT 


26-44- 1-B5-CL3 1 


120 


PRT 


47-4-4-C6-CL2 3 


121 


PRT 


47-40-4-G9-CL1 1 


122 


PRT 


48-25-4-D8-CL1 7 


123 


PRT 


48-28-3-A9-CL0 1 


124 


PRT 


51-25-1-A2-CL3 1 


125 


PRT 


55-10-3-F5-CL0 3 


126 


PRT 


57-1 9-2 -G8-CL1 3 


127 


PRT 


58-34-2-H8-CL1 3 


128 


PRT 


76-13-3-A9-CLl_l 


129 


PRT 


78-7-2-B8-FL1 


155 


PRT 


77-8-4-F9-FL1 


156 


PRT 


58-8-1-F2-FL2 


157 


PRT 


77-13-1-A7-FL2 


158 


PRT 


47-2-3-G9-FL1 


159 


PRT 


33-75-4-H7-FL1 


160 


PRT 


51-41-1-F10-FL1 


161 


PRT 


48-51-4-C11-FL1 


162 


PRT 


33-58-3-C8-FL1 


163 


PRT 


76-20-4-C11-FL1 


164 


PRT 


76-28-3 -A 1 2-FL 1 


165 


PRT 


76-25-4-F11-FL1 


166 


PRT 


58-20-4-G7-FL1 


167 


PRT 


33-54-1-B9-FL1 


168 


PRT 


76-20-3-H1-FL1 


169 


PRT 


47-20-2-G3-FL1 


170 


PRT 


78-25-1-H11-FL1 


171 


PRT 


78-6-2-B10-FL1 


172 


PRT 


58-49-3-G10-FL1 


173 


PRT 


78-21-1-B7-FL1 


174 


PRT 


57-28-4-B12-FL1 


175 


PRT 


33-77-4-E2-FL1 


176 


PRT 


58-19-3-D3-FL2 


177 


PRT 


37-7-4-E7-FL1 


178 


PRT 


60-14-2-H10-FL1 


179 


PRT 
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TABLE VIII 



Id 


Locations 


PROSITE signature 
Name 


89 


205-226 


Leucine zipper 


95 


5-66 


Amino acid permease 


103 


46-67 


Leucine zipper 


113 


259-280 


Leucine zipper 


120 


27-40 


MAT8 family 


122 


123-125 


Cell attachment sequence 
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